1) Login:(same as dREG)
The user needs to log in by clicking 'login' link at the top-right corner of the page. Having an account provides a number of benefits, and is free and easy.
2) Create a new project (optional, same as dREG)
Optionally, users can choose to make a new 'project' in the dREG/dTOX gateway to archive a collection of sequencing data from related experiments. This will allow a collection of experiments to be stored in close proximity to each other.
3) Start new dTOX
Select the menu 'Start dREG/dTOX' below the dREG logo to create an data analysis for your data, as the following screenshot. Please notice to select the "dTOX prediction" Application.
4) Fill experiment form
Select bigWig files representing PRO-seq, ATAC-seq, or DNase-I-seq signal on the plus and minus strand.
5) Submit the job
Click the 'save and launch' button. BigWig file are transferred to the XSEDE server and a GPU queue is scheduled to run dTOX. After submitting, the user can check the status in the next web page, as shown below. Depending on the queue status, the job may wait for some time to start prediction. Once started, it will take 6-10 hours to complete depending on the genome used.
6) Check the status
The user can check the status of their 'experiment' by clicking the 'Saved runs' button on the top menu.
7) Check the results
Once a job is completed, the user can select 'dTOX Bound Regions' in the drop-down list and then LEFT-click 'Download' link in the experiment summary page to download a compressed file described in the 'output' sheet in this page. The downloaded file has a 'gz' extension and can be decompressed by the 'gunzip' command in Linux. Please don't use RIGHT-click to open a tab for downloading. To extract bound motifs for one specific transcription factor, download our R script (here) In Safari, it could be problematic because Safari tries to unzip the compressed results automatically using a non-compatible compression method. Please check this link to disable this feature.
8) Switch to Genome Browser
The convenient tool provided by the gateway is the user can check the results in the Genome Browser by clicking 'Switch to genome browser' link. The genome identifier must be specified by two ways, 1) select from the drop-down list or 2) fill the identifier in the textbox. Please use LEFT-click to open a genome browser window.
9) Check the storage
The user can LEFT-click 'Open Folder' link in the experiment summary page to check the storage for the current job or click the menu 'Storage' under the dREG logo to check the folders and files for all jobs(experiments). The following figure shows the data files in the job's folder, including two bigWig files, one result in bedgraph format, two outputs of job scheduler on GPU nodes.
10) If your job fails
When you run dTOX, there are two main types of errors you may encounter. One error may come from the system, called a system error, such as no computing time on specific GPU nodes or an internal errors in Apache Airavata. The other type of error is caused by the users' bigWig file, called a bigWig error, which can occur when read counts are normalized, each read is mapped to a region, or read counts in minus strand are positive values. The following figures show how to identify the error and how to handle it.
a) System error
When users submit the experiment, the failure will be shown in the experiment summary page soon as figure 10-S1 or 10-S2. The experiment status is "Failed" and many java errors are shown in the "Errors" item. Users can't solve this problem and should report this error the web master.
b) Bigwig error
After the experiment is complete, no results can be downloaded and job status shows a failure (see Figure 10-S3). Users can find the dTOX log file or task log file to identify the problem. Enter into "storage directory" by clicking the "open" link. The users can find "ARCHIVE" folder where Apache Airavata copies back all files from the computing node. Check the dTOX log file (run.dTOX.log) to see the bigwig problem or check the task log file ("slurm-tasknoxxx.out") and find the reason why the task was aborted. Figure 10-S4 and 10-S5 give two examples for this kind of error. If the bigwig has problems, please refer to the link for PRO-seq, link for DNase-I-seq, or link for ATAC-seq to solve the problems.
This figure shows the bigWig problems in the dREG log file.
This figure shows the task log file in which explains the task was killed due to time limit.
The input to dTOX consists of two bigWig files which represent either the position of RNA polymerase on the positive and negative strands (PRO-seq) or the accessibility on the positive and negative strands (DNase-I-seq or ATAC-seq). The sequence alignment and processing steps to make the input bigWig files are a major factor influencing how accurately dTOX predicts transcription factor binding.
A key component of all datatypes is that data represents unnormalized raw counts. dTOX assumes that data represents the number of individual sequence tags that are located at each genomic position. For this reason, it is critical that input data is not normalized. The server checks to ensure that input data is expressed as integers, and will return an error if this is not the case.
Users can also use scripts generated in the Danko lab to create compatible bigWig files. Options for scripts at different starting points in the analysis are given below:
- Convert raw fastq files into bigWig.
Our pipeline produces bigWig files that are compatible with dREG, and can be found at the following URLs: https://github.com/Danko-Lab/proseq_2.0 (PRO-seq), https://github.com/Danko-Lab/atac (ATAC-seq), https://github.com/Danko-Lab/dnase (DNase-I-seq). The pipelines automate routine pre-processing and alignment steps, including pre-processing reads to remove the adapter sequences and trim based on base quality, and deduplicate the reads if UMI barcodes are used. Sequencing reads are mapped to a reference genome using BWA. Aligned BAM files are converted into bigWig format in which each read is represented by a single base.
- Convert mapped reads in BAM files into bigWigs.
We provide scripts that convert mapped reads from a BAM file into bigWig files that are compatible with dTOX. The scripts are avavailable on our GitHub page. For PRO-seq: https://github.com/Danko-Lab/RunOnBamToBigWig. For DNase-I-seq: https://github.com/Danko-Lab/utils/dnase/BamToBigWig. For ATAC-seq: https://github.com/Danko-Lab/utils/atacseq/BamToBigWig.
The quality and quantity of the experimental data are major factors in determining how sensitive dTOX will be in detecting transcription factor binding. To increase the number of reads available for transcription factor binding detection, we encourage users to merge biological replicates in order to improve statistical power prior to running dTOX. Additionally, to compare binding predictions between conditions we recommend comparing samples at similar sequencing depths or down sampling to create similar sequencing depths.
We have found that visualizing aligned data in a genome browser prior (e.g., IGV or UCSC) to downstream analysis is a useful way to catch any data quality or alignment issues.
1) A dTOX run generates a compressed file including the following files:
|$PREFIX.dTOX.bound.bed.gz||TFBS regions that are predicted as bound. The file includes chromosome, start, ending, MOTIF ID, RTFBSDB score, strand, dTOX score, bound status. Decompress it with 'gunzip' in Linux.|
2) In the Web storage folder there are some files required by the WashU genome browser:
|$PREFIX.dTOX.bound.bw||The bigWig file converted from bound motifs ($PREFIX.dTOX.bound.bed.gz).|
|*.bed.gz.tbi||The index files generated from the corresponding bed files. Please ignore them if you download the results.|
3) There are one log file in the Web storage folder:
|slurm-??????.out||The verbose log output of dTOX package.|
dREG Gateway is online service that supports Web-based science through the execution of online computational experiments and the management of data. Below are frequent questions about the dREG Gateway and the dTOX program.
Q: How should I prepare bigWig files for use with dTOX?
Q: How should I do when I meet the computational failure in the dREG gateway?
A: There are two types of error you may have, we explain how to identify your error and how to handle it here.
Q: Which browser works well with the dREG gateway?
A: We have tested in the Firefox, Google Chrome and Safari so far. For IE (version 10 or 11) and some version of Safari, you maybe have trouble showing sequence data in WashU genome browser. For Safari users, please read next Q&A.
Q: What should the Safari users be aware of?
A: By default, Safari unzips a zip file automatically when you download it. However dTOX results are compressed by the 'bgzip' command which is not compatiable with the Safari method. It would be problematic when you download dTOX results. Please refer to this link to disable this feature in Safari and then download the compressed results from dREG gateway. Secondly, when you click the genome browser link, please use the Left-Click, don't use Right-Click menu and the menu option "open a new tab".
Q: Will dTOX work with my data type?
A: dTOX was trained and tested on PRO-seq, ATAC-seq, and DNase-I-seq. dTOX will also work well with data collected by any run-on and sequencing method, including GRO-seq, PRO-seq, or ChRO-seq. Other methods that map the location of RNA polymerase genome wide using alternative tools (for example, NET-seq) will most likely work well, but are not officially supported.
Q: Will the pre-trained models work using data from my species?
A: Models are currently available only in mammalian organisms. The length and density of genes, which vary considerably between highly divergent species, affects the way that a transcribed promoter or enhancer looks. For this reason, models can only be used in species. We are working to create models in widely-used model organisms, including drosophila and C. elegans.
Q: How deeply do I need to sequence PRO-seq libraries?
A: Sensitivity is reasonable at ~40 million mapped reads and saturates at ~100 million mapped reads. See our analysis here: supplementary figure 3 in dREG paper.
Q: How long do my data and results keep in the dREG gateway?
A: One month.
Q: How do I cite dTOX?
A: Please cite our papers if you use dTOX results in your publication:
(1) ADD CITATION. Choate, L. A., Wang, Z., & Danko, C. G. (2018). Identification of transcription factor binding using genome-wide accessibility and transcription. bioRxiv.
Q: Do I have to create account before using this service?
A: Yes, this system is supported by an NSF funded supercomputing resource known as XSEDE, who regularly needs to report bulk usage statistics to NSF. Nevertheless, data that you provide are completely safe.
Q: How do I know the status of the computational nodes?
A: Since we can't update this web site very often, the gateway status is updated here on the dREG page based on the notifications of the XSEDE community.
Q: Who do I thank for the computing power?
Q: I have another question that is not on this FAQ. How can I contact you?
A: Yes, please contact us with any questions! Zhong(zw355 at cornell.edu). Charles(cgd24 at cornell.edu).