Using EPIC to estimate the proportion of various cell types in bulk samples

Results from EPIC

Please select first the parameters of your run and then click on 'Run EPIC'.

Main input parameters have changed. Please click on 'Run EPIC' to compute the new cell fractions.

Table
Figures

Download the results table

Download all figures and results

A description of EPIC method is found in the about page tab .

This help page describes how to use EPIC 's web application.

The main app window is showed below.

It is composed of 3 regions : (A) the menu to navigate between the application and information pages; (B) the input parameters area and (C) the results area. We describe below the inputs and then the results.

Running EPIC application

In order to launch EPIC , you need to (1) upload some bulk gene expression samples, (2) select which reference profiles to use, (3) decide whether results should be given as cell fractions or mRNA fractions, (4) optionally give some name to your job for the file downloads and (5) then click on Run EPIC to launch the computations (it takes some seconds or minutes depending on the number of samples and genes).

Further details on these options are given below.

(1)

The bulk gene expression samples need to be given as a tab-delimited text file (not comma-separated csv or other format). The maximum size of this file is 100 MB (if bigger size is needed, you can either split your samples between multiple files or use the R-package from EPIC ). The first column gives the gene names (use the official gene symbols when using the prebuilt reference profiles - e.g. CD8A, MS4A1, ... - Note: it is advised to keep all genes in this file instead of a subset of signature genes). The other columns give the gene expression from each sample (these counts should be given in transcripts per million (TPM) or reads/fragments per kilobase per million mapped reads (RPKM/FPKM)). Columns headers give the name of each sample. An example file can be downloaded from here .
Once your bulk samples data is uploaded, some text shows how many samples and genes are present in this file.

(2)

The next step is to select the reference gene expression profiles. You can either use some prebuilt references profiles (see the accompanying publication referenced in the about page tab ) derived from blood circulating immune cells or from tumor infiltrating cells (including immunes cells but also stromal and endothelial cells), or you can upload your own reference profiles.

Under the selected profile, some text indicates which cell types are present in this profile (note that EPIC will also predict the proportion from an additional cell type for which no reference profile was provided). Some other text gives information about the number of genes that could be matched between the reference profiles and the bulk samples.

If you decide to use your own gene expression reference profiles, two other file upload boxes will appear:

The first one is to upload your reference profile matrix. This is again some tab-delimited file, containing the gene names in the first column (it could be any type of name (gene name, ensembl ID, ...), as long as it is the same type of names than in the bulk data). The other columns give for each cell type its reference profile expression (counts in TPM, RPKM, FPKM or it is possible to also have both the bulk data and reference profiles given in raw counts; but these counts should never be in log scale). In addition to these columns, one can optionally have additional columns giving the gene expression variability for each cell type - these are given by appending to the cell type name a string '.var', e.g. you could have the columns geneName, Bcells, Bcells.var, Tcells, Tcells.var in order to define the reference profiles and variability for B cells and T cells (if you define the variability in gene expression, it is needed to have this information for each cell type). Note that the order of the columns except for the first one are not important. An example reference profile file is available here .

When defining your own gene expression reference profiles, you also need to define a list of cell signature genes to use by EPIC . These signature genes do not need to be expressed by only one cell type, but they should at least be expressed only by the cell types for which reference profiles are given. The list of genes can either be given as one gene per line or as genes separated by space or comma like in this example list .

(3-5)

The next parameter is to decide if results should be showed as cell fraction or mRNA fractions. These two fractions are not the same as some cell types express more mRNA copies per cell than some other cell types (see the accompanying publication referenced in the about page tab ). Currently, mRNA per cell values are available for the main immune cell types and average values are used for the other cell types. Note that this option can also be changed after running EPIC , the results will not need to be re-computed as both values are stored during the run.

The last parameter is an optional job name that is used in the figure titles and file names if you download the results.

The results area

After running EPIC , the results area (C) is composed of two panels. The first panel shows a table of the results, with the proportions from each cell type in each sample. This table can also be downloaded as a tab-delimited file.

The second panel shows the results with multiple figures. These figures can also be downloaded as a pdf file.

Figure (6) shows the composition of each sample separately.
Figure (7) shows the distribution of each cell type among the various samples.
Figure (8) is a heatmap detailing the fraction of each cell type in each samples. A clustering both on the cell types and samples is done.

I received a message 'Disconnected from server'

This happens in general when there is an issue with some input file you uploaded.
Please double-check the file format - more details are available in the help page .

Which proportions returned by EPIC should I use?

EPIC is returning two proportion values: mRNAProportions and cellFractions where the 2nd represents the true proportion of cells coming from the different cell types when considering differences in mRNA expression between cell types. So in principle, it is best to consider these cellFractions .
However, please note, that when the goal is to benchmark EPIC predictions, if the 'bulk samples' correspond in fact to in silico samples reconstructed for example from single-cell RNA-seq data, then it is usually better to compare the 'true' proportions against the mRNAProportions from EPIC. Indeed, when building such in silico samples, the fact that different cell types express different amount of mRNA is usually not taken into account. On the other side, if working with true bulk samples, then you should compare the true cell proportions (measured e.g., by FACS) against the cellFractions .

What do the 'other cells' represent?

EPIC predicts the proportions of the various cell types for which we have gene expression reference profiles (and corresponding gene signatures). But, depending on the bulk sample, it is possible that some other cell types are present for which we don't have any reference profile. EPIC returns the proportion of these remaining cells under the name 'other cells'. In the case of tumor samples, most of these other cells would certainly correspond to the cancer cells, but it could be that there are also some stromal cells or epithelial cells for example.

Is there some caution to consider about the cellFractions and mRNA_cell values?

As described in our manuscript, EPIC first estimates the proportion of mRNA per cell type in the bulk and then it uses the fact that some cell types have more mRNA copies per cell than other to normalize this and obtain an estimate of the proportion of cells instead of mRNA (EPIC function returns both information if you need the one or the other). For this normalization we had either measured the amount of mRNA per cell or found it in the literature (fig. 1 – fig. supplement 2 of our paper). However we don’t currently have such values for the endothelial cells and CAFs. Therefore for these two cell types, we use an average value, which might not reflect their true value and this could bias a bit the predictions, especially for these cell types. If you have some values for these mRNA/cell abundances, you can use the R-package of EPIC to set these values (not available in the web application).
If the mRNA proportions of these cell types are low, then even if you don't correct the results with their true mRNA/cell abundances, it would not really have a big impact on the results. On the other side, if there are many of these cells in your bulk sample, the results might be a little bit biased, but the effect should be similar for all samples and thus not have a too big importance (maybe you wouldn’t be fully able to tell if there are more CAFs than Tcells for example, but you should still have a good estimate of which sample has more CAFs (or Tcells) than which other sample for example).

I receive a warning message that 'the optimization didn't fully converge for some samples'. What does it mean?

When estimating the cell proportions EPIC performs a least square regression between the observed expression of the signature genes and the expression of these genes predicted based on the estimated proportions and gene expression reference profiles of the various cell types. When such a warning message appears, it means that the optimization didn’t manage to fully converge for this regression, for the samples indicated. Such cases usually happen when the maximum number of iterations has been reached in the optimization, meaning there is possibly an issue with the bulk gene expression data that maybe don’t completely follow the assumption of equation (1) from our manuscript. From our experience, it seems in practice that even when there was such a warning message the proportions were predicted well, it is maybe that the optimization just wants to be too precise, or maybe few of the signature genes didn’t match well but the rest of signature genes could be used to have a good estimate of the proportions.
If you have some samples that seem to have strange results, it could however be useful to check that the issue is not that these samples didn’t converge well. To be more conservative you could also remove all the samples that didn't converge well as these are maybe outliers, if it is only a small fraction from your original samples.

Who should I contact in case of a technical or other issue?

Julien Racle ( julien.racle@unil.ch ). Please provide as much details as possible and ideally send also an example input file (and/or reference profiles) that is causing the issue.

EPIC application is designed to Estimate the Proportion of Immune and Cancer cells from bulk tumor gene expression data. This is done by fitting gene expression reference profiles from the main non-malignant cell types and simultaneously accounting for an uncharacterized cell type without prior knowledge about it (e.g. cancer cells in solid tumors samples). We derived reference gene expression profiles from the main tumor infiltrating cell types (i.e. immune subsets, stromal and endothelial cells). This method could however also be applied to predict the cell fractions in other types of mixed samples if reference gene expression profiles from these other cell types are available.

All the details about this method are available in
Racle, J., Jonge, K. de, Baumgaertner, P., Speiser, D.E., and Gfeller, D. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife, 6, e26476. ( https://elifesciences.org/articles/26476 ).

EPIC is also available as an R -package ( https://github.com/GfellerLab/EPIC ), containing some advanced options. Release version 1.1 of the package is used in this web-application.

This application and the R -package have been developed by Julien Racle from the David Gfeller's group at the Ludwig Center for Cancer Research of the University of Lausanne ( https://www.unil.ch/dof/research/gfeller ).

A pyhton wrapper for EPIC has been written by Stephen C. Van Nostrand from MIT and is available at https://github.com/scvannost/epicpy .

How to cite

Racle, J., Jonge, K. de, Baumgaertner, P., Speiser, D.E., and Gfeller, D. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife, 6, e26476

License

EPIC can be used freely by academic groups for non-commercial purposes. The product is provided free of charge, and, therefore, on an " as is " basis, without warranty of any kind.

If you plan to use EPIC (version 1.1) in any for-profit application, you are required to obtain a separate license.
To do so, please contact Nadette Bulgin ( nbulgin@lcr.org ) at the Ludwig Institute for Cancer Research Ltd.

Contact information

Julien Racle ( julien.racle@unil.ch ) and David Gfeller ( david.gfeller@unil.ch )