The user needs to log in by clicking 'Log in' link at the top-right corner of the page. Having an account provides a number of benefits, and is free and easy.
2 Create a new experiment
Select the BayesPrism application on the dashboard panel to create a data analysis for your data, as the following screenshot (Figure 2).
3 Set experiment name
Rename Experiment Name, and click Add a description to comment on the experimental setup (optional). Choose the project that the experiment belongs to. By default, the "Default Project" is created and used.
4 Upload count matrix files
The bayesPrism need two types of count matrix file: the bulk RNA-seq count matrix and the reference count matrix. Currently we implement multiple data import from the dfferent data source, such as tsv, xls, rds/dataframe, rds/suerat, h5ad. For details please check the input tab of this page.
The gateway provides two ways to upload count matrix files for users. (1) Click "Select files from storage" to choose existing files submitted for previous tasks, or (2) click "Drop files here or browse" to upload new files from user's storage.
(1) Each row of count matrix indicates one unique gene id, so the count matrices should have same gene set in the bulk and in the reference.
(2) Count matrices can not be normalized
(3) At least 50 reads for each cell type are suggested.
5 Set computing parameters
(1) Specify species for the gene removal in ribosomal, mitochondria, chrX, and chrY. For other species, the users need to remove these genes manually.
(2) Specify the cell type and tumor state for each cell sample in the reference count matrix using CSV format. 3 columns are defined: cell_id, cell_type and tumor_state. The tumor state should be 0 (non-tumor) or 1(Tumor).
(3) Specify the prefix of the output files. This can help distinguish results from multiple experiments.
6 Submit the job
Once steps 1-5 are finished, proceed to "save and launch". Input data and parameters will be submitted to the computing node of the XSEDE cluster via the dREG gateway server. Click the checkbox next to "Receive email notification of experiment status" if needed. Upon launching, users will be directed to the "Experiments" page, shown in Fig. 4. A typical experiment usually finishes within 4 hrs. Users may view the progress by logging in and clicking the "Experiment button on the left control panel at the dashboard.
7 Check the status
Users may view the progress by logging in and clicking the "Experiments" button on the left control panel at the dashboard. All experiments submitted are listed on this page.
8 Check the results
Once a job is completed, the user can click selected BayesPrism experiment and the website will jump to Experiment Summary page. All parameters used to set up the experiment are listed on this page. The user can also access output files of BayesPrism stored in the ARCHIVE. Just click the ARCHIVE to check any single result file. A compressed file, including input count matrix file set, two task log files and all result files, is also provided for users. Click Download Zip button to download a compressed file. The downloaded file with the 'tar.gz' extension can be decompressed by the 'tar' command, the file with the 'gz' extension can be decompressed by the 'gunzip' command in Linux. In Safari, it could be problematic because Safari tries to unzip the compressed results automatically using a non-compatible compress method. Please check this link to disable this feature.
The input to BayesPrism consists of two count matrices which represent the read counts in bulk sample and in reference scRNA (or GEP). The count matrix file of scRNA or GEP can be exported from the single cell package, such as Seurat, CellRanger. Here we first explain the data format of count matrix used in BayesPrism.
1 Count Matrices
|Data Format||Used For||Description|
|TSV||Bulk,Reference||A tab-separated values file contains read counts for each gene (as row) in every sample (as column). BayesPrism requires TSV with row names and column names.|
|XLS||Bulk,Reference||An Excel file contains read counts for each gene (as row) in every sample (as column). BayesPrism requires the first row of XLS gives all sample names and the first column give all gene names or IDs|
|RDS/dataframe||Bulk,Reference||This RDS file is an R data frame conatins read counts for each gene (as row) in every sample (as column). BayesPrism requires the data frame has row names (genes) and column names (samples).|
|RDS/sce||Reference||This RDS file contains a SingleCellExperiment object which repersents read counts for each gene (as row) in each sample (as column).|
|RDS/seurat||Reference||This RDS file contains a Seurat object which represents single-cell expression data for R. Each Seurat object revolves around a set of cells and consists of one or more Assay objects.|
|h5ad||Reference||Hierarchical Data Format version 5 (HDF5) is used to store both the expression values and associated annotations on the genes and cells in Python. H5AD format can be read into R as a SingleCellExperiment.|
(1) The bulk matrix and the reference matrix should use same gene annotation.
(2) All matrices use raw counts, not allow normalized data.
(3) In the reference matrix, at least 50 reads are required for each cell type
2 Cell type and tumor state
If the reference count matrix doen't contain the cell type and tumor state for each cell, the user must provide a CSV file to indicate the cell type and tumor state for each cell. The CSV should have 3 columns: cell id, cell type, tumor state ( values: 0 or 1).
BayesPrism removes genes in ribosomal and mitochondria, chrX, and chrY before deconvolution. If the data is not for human and mouse, the users have to remove these genes in advance.
4 scRNA or GEP
The reference count matrix could represent scRNA data or GEP (Gene Expression Profile) data. GEP only support TSV, XLS, and RDS/datframe.
1 BayesPrism output files
BayesPrism generates a RDATA file ($PREFIX.rdata) for R users and a compressed file ($PREFIX.tar.gz) for Python users.
R users can open RDATA file using "load" commmand easily. Python users need to extract multiple RDS files (see the following table) using the decommpresion command "tar -xvzf" on Linux
Note: All files below are stored in the "ARCHIVE" directory.
|$PREFIX.rdata||This Rdata file contains the 'rted' object which can be explored by the 'str' command.|
|$PREFIX.tar.gz||The compressed file contains multiple RDS data which represent the items of the 'rted' object. The following table shows all RDS data.|
|$PREFIX.cor.pdf||The correlation heatmap indicates the correlation between the samples in the bulk data.|
2 Contents in $PREFIX.tar.gz
|Access cell type fractions|
|rted.res.first_gibbs_res.gibbs_theta.rds||Initial estimation of fraction for all cell subtypes in each bulk sample
R code: rted$res$first.gibbs.res$gibbs.theta
|rted.res.first_gibbs_res.theta_merged.rds||Initial estimation of fraction for all cell subtypes in each bulk sample
R code: rted$res$first.gibbs.res$theta.merged
|rted.res.final_gibbs_theta.rds||The updated estimates of cell type fraction
R code: rted$res$first.gibbs.theta
|Access gene expression(raw read scale)|
|rted.res.first_gibbs_res.Znkg.rds||The estimates of the mean of posterior read count for each cell subtype in each bulk sample.
R code: rted$res$first.gibbs.res$Znkg
|rted.res.first_gibbs_res.Znkg_merged.rds||The estimates of the mean of posterior read count for each cell subtype (merged across subtypes) in each bulk sample.
R code: rted$res$first.gibbs.res$Znkg_merged
|Access gene expression(normalized read scale)|
|rted.res.first_gibbs_res.Zkg_tum.rds||The mean count of tumor expression in each bulk sample.
R code: rted$res$first.gibbs.res$Zkg.tum
|rted.res.first_gibbs_res.Zkg_merge.rds||The mean count of tumor expression in each bulk sample.
R code: rted$res$first.gibbs.res$Zkg.merge
|rted.res.first_gibbs_res.Zkg_tum_norm.rds||The depth-normalized count of tumor expression in each bulk sample.
R code: rted$res$first.gibbs.res$Zkg.tum.norm
|rted.res.first_gibbs_res.Zkg_tum_vst.rds||The variance stabilizing transformed count of tumor expression in each bulk sample.
R code: rted$res$first.gibbs.res$Zkg.tum.vst
|rted.res.phi_env.rds||The batch corrected non-malignant cell expression.
R code: rted$res$phi.env
|rted.res.first_gibbs_res.cor_mat.rds||Correlation between the samples in the bulk matrix.
R code: rted$res$first.gibbs.res$cor.mat
R code: rted$para$input.phi
R code: rted$para$input.phi.prior
2 Read RDS results in Python.
Python users can use 'pyreadr' to read RDS file (https://stackoverflow.com/questions/40996175/loading-a-rds-file-in-pandas).
Here we briefly show how to read it in Python.
result = pyreadr.read_r('rted.res.first_gibbs_res.gibbs_theta.rds')
# Extract the pandas data frame. In the case of Rds there is only one object with None as key
df = result[None]
3 Correlation plot.
dREG Gateway is online service that supports Web-based science through the execution of online computational experiments and the management of data. The items below are trying to answer qustions from the users
Q: How should I prepare count matrix files for bayesPrism use with the dREG gateway?
Q: How should I do when I meet the computational failure in the dREG gateway?
A: There are two types of error you may have, we explain how to identify your error and how to handle it here.
Q: Which browser works well with the dREG gateway?
A: We have tested in the Firefox, Google Chrome and Safari so far. For IE (version 10 or 11) and some version of Safari, you maybe have trouble showing sequence data in WashU genome browser. For Safari users, please read next Q&A.
Q: What should the Safari users be aware of?
A: By default, Safari unzips a zip file automatically when you download it. However dREG results are compressed by the 'bgzip' command which is not compatiable with the Safari method. It would be probelmatic when you download dREG results. Please refer to this link to disable this feature in Safari and then download the compressed results from dREG gateway. Secondly, when you click the genome browser link, please use the Left-Click, don't use Right-Click menu and the menu option "open a new tab".
Q: How long do my data and results keep in the dREG gateway?
A: One month.
Q: Do I have to create account before using this service?
A: Yes, this system is supported by an NSF funded supercomputing resource known as XSEDE, who regularly needs to report bulk usage statistics to NSF. Nevertheless, data that you provide are completely safe.
Q: How do I know the status of the computational nodes?
A: Since we can't update this web site very often, the gateway status is updated here on the dREG page based on the notifications of the XSEDE community.
Q: Who do I thank for the computing power?
Q: I have another question that is not on this FAQ. How can I contact you?
A: Yes, please contact us with any questions! Zhong(zw355 at cornell.edu). Charles(cgd24 at cornell.edu).