RNA-seq Pipeline for Known Transcripts: Difference between revisions

Revision as of 13:37, 14 September 2011

Sequence Quality and Trimming

Run FASTQC to assess quality of reads from sequencer and:
Filter for quality, if applicable
Trim, if applicable
FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

Generate a Reference Genome

Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format).
http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).
[Optional input and parameter settings are in square brackets.]
<Required parameters are in greater than/less than brackets.>
This BW transformed reference genome can be created once then used repeatedly in the future.
$ is the command prompt.

$ bowtie-build [-f specifies reference genome is in fasta format] <path to input reference genome (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa)> <base name for reference genome output .ebwt files (e.g hg19)>

Align Reads to Reference Genome with Tophat

Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.

$ tophat [-p #processors -o ./output_directory] <./reference genome in both .ebwt and fasta formats (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19)> <reads file to be aligned (e.g. s_1_1_sequence.fastq)>
$ tophat -p 5 -o ./HG19/tophat_out_hg19_001_trimmed /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19 ./HG19/Rich_trim/A_1_16_85.fastq

Use Cuffcompare to Generate .gtf Reference

Run cuffcompare to create .gtf format reference genome from a generic reference genome. Note that cuffcompare adds the tss_id and p_id columns that you will need in cuffdiff. This .gtf reference can be created once then used repeatedly in the future.

$ cuffcompare [-o ./output_directory] < input file twice (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf )>

$ cuffcompare -o ./cuffcompare_out /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf

Use Cuffdiff to Identify Differentially Expressed Transcripts

Run cuffdiff to identify differentially abundant transcripts.

$ cuffdiff  [-p #processors -o ./output_directory –L label1,label2,etc. –T (for time series data) –N (use upper quantile normalization –compatible_hits_norm (use reference hits in normalization) –b (use reference transcripts to reduce bias, include path to file e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa) –u (improve multi-read weighting) ] <transcripts.gtf (produced by cuffcompare) sample_A_accepted_hits1.bam, sample_A_accepted_hits2.bam,etc (all produced by tophat) sample_B_accepted_hits1.bam,sample_B_accepted_hits2.bam, etc> 

$ cuffdiff -o ./HG19/Cuffdiff_out_options_b_u_N_compatible/ -p 14 -L Control,PUF_kd --no-update-check -b /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa -u -N --compatible-hits-norm /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/cuffcompare_out.combined.gtf /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_001_trimmed/accepted_hits.bam /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_002_trimmed/accepted_hits.bam

@@ Line 1: / Line 1: @@
-RNA-Seq pipeline – known transcripts
+==Sequence Quality and Trimming==
-.	Run FASTQ to assess quality of reads from sequencer and:
+# Run FASTQC to assess quality of reads from sequencer and:
-a.	Filter for quality, if applicable
+# Filter for quality, if applicable
-b.	Trim, if applicable
+# Trim, if applicable
-c.	http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
+# FASTQC available at http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/
-.	Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format).
+==Generate a Reference Genome==
-a.	http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).
+# Run bowtie-build to generate Burroughs Wheeler transformed reference genome (.ebwt format).
-b.	[Optional input and parameter settings are in square brackets.]
+# http://bowtie-bio.sourceforge.net/index.shtml (bowtie, tophat, and cufflinks are here).
-c.	<Required parameters are in greater than/less than brackets.>
+# [Optional input and parameter settings are in square brackets.]
-d.	 This BW transformed reference genome can be created once then used repeatedly in the future.
+# <Required parameters are in greater than/less than brackets.>
-e.	$ is the command prompt.
+# This BW transformed reference genome can be created once then used repeatedly in the future.
+# $ is the command prompt.
+<pre>
 $ bowtie-build [-f specifies reference genome is in fasta format] <path to input reference genome (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa)> <base name for reference genome output .ebwt files (e.g hg19)>
+</pre>
-.	Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.
+==Align Reads to Reference Genome with Tophat==
+Run tophat to align reads to the reference genome. I’ve included a pseudo command line as well as a “real” command line.
+<pre>
 $ tophat [-p #processors -o ./output_directory] <./reference genome in both .ebwt and fasta formats (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19)> <reads file to be aligned (e.g. s_1_1_sequence.fastq)>
 $ tophat -p 5 -o ./HG19/tophat_out_hg19_001_trimmed /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19 ./HG19/Rich_trim/A_1_16_85.fastq
+</pre>
-.	Run cuffcompare to create .gtf format reference genome from a generic reference genome. Note that cuffcompare adds the tss_id and p_id columns that you will need in cuffdiff. This .gtf reference can be created once then used repeatedly in the future.
+==Use Cuffcompare to Generate .gtf Reference==
+Run cuffcompare to create .gtf format reference genome from a generic reference genome. Note that cuffcompare adds the tss_id and p_id columns that you will need in cuffdiff. This .gtf reference can be created once then used repeatedly in the future.
+<pre>
 $ cuffcompare [-o ./output_directory] < input file twice (e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19.gtf )>
 $ cuffcompare -o ./cuffcompare_out /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf /ccmb/CoreBA/BioinfCore/Common/DATA/CufflinksData_hg19/hg19_genes.gtf
+</pre>
+==Use Cuffdiff to Identify Differentially Expressed Transcripts==
-.	Run cuffdiff to identify differentially abundant transcripts.
+Run cuffdiff to identify differentially abundant transcripts.
+<pre>
 $ cuffdiff  [-p #processors -o ./output_directory –L label1,label2,etc. –T (for time series data) –N (use upper quantile normalization –compatible_hits_norm (use reference hits in normalization) –b (use reference transcripts to reduce bias, include path to file e.g. /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa) –u (improve multi-read weighting) ] <transcripts.gtf (produced by cuffcompare) sample_A_accepted_hits1.bam, sample_A_accepted_hits2.bam,etc (all produced by tophat) sample_B_accepted_hits1.bam,sample_B_accepted_hits2.bam, etc>
 $ cuffdiff -o ./HG19/Cuffdiff_out_options_b_u_N_compatible/ -p 14 -L Control,PUF_kd --no-update-check -b /ccmb/CoreBA/BioinfCore/Common/DATA/BowtieData/H_Sapiens/hg19.fa -u -N --compatible-hits-norm /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/cuffcompare_out.combined.gtf /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_001_trimmed/accepted_hits.bam /ccmb/CoreBA/BioinfCore/Projects/Goldstrohm_McEachin/HG19/tophat_out_hg19_002_trimmed/accepted_hits.bam
+</pre>

RNA-seq Pipeline for Known Transcripts: Difference between revisions

Revision as of 13:37, 14 September 2011

Contents

Sequence Quality and Trimming

Generate a Reference Genome

Align Reads to Reference Genome with Tophat

Use Cuffcompare to Generate .gtf Reference

Use Cuffdiff to Identify Differentially Expressed Transcripts

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools