% $INCHWORM_HOME/bin/run_BLAT_shortReadPipeline.pl
Included in the Inchworm software distribution is a pipeline that aligns RNA-Seq reads to the genome using BLAT, ultimately producing SAM (and BAM) files, which can be used below for genome-guided Inchworm assembly. This alignment pipeline excels for genomes containing genes with short introns (plants, fungi, protozoa, etc). For long-intron-containing genomes, TopHat is recommended.
In addition to the Inchworm software, to use this pipeline you must install:
Both blat and samtools (particularly the psl2sam.pl script included in samtools) need to be available within your PATH setting.
Once the above tools are installed, the BLAT short-read alignment pipeline can be run as follows, starting from FASTA or FASTQ files, and single or paired-reads:
% $INCHWORM_HOME/bin/run_BLAT_shortReadPipeline.pl
################################################################################################################ # # --left and --right (if paired reads) # or # --single (if unpaired reads) # # Required inputs: # # --genome multi-fasta file containing the genome sequences (should be named {refName}.fa ) # # --seqType fa | fq (fastA or fastQ format) # # Optional: # # --SS_lib_type strand-specific library type: single: F or R paired: FR or RF # examples: single RNA-Ligation method: F # single dUTP method: R # paired dUTP method: RF # # -I maximum intron length (default: 10000); # # -o output directory # # --trim_short_terminal_segments (trim off short terminal alignment segments that are mostly noise. Default: 10) # # -P min percent identity based on full sequence length (default: 95) # # --blat_top_hits (default: 20 in paired mode, 1 in single mode) # # -C final top hits reported (default: 1) (only applies to paired mode) # # If paired mode: # # --max_dist_between_pairs default (2000) # ####################################################################################################################
Example data sets described below can be downloaded here as BLAT_short_read_alignment_pipeline-(datestamp).tgz.
Example data and pipeline execution are provided for:
paired reads (strand-specific, SS_lib_type: RF): example_BLAT_shortReadAlignmentPipeline/pairedSS. The strand-specific library type (SS_lib_type) of RF corresponds to the following, which results from the dUTP-based strand-specific sequencing method:
========> /2 (right of sequenced fragment) =======================================> (transcript fragment, sense orientation) <============ /1 (left of sequenced fragment)
single (unpaired) reads (strand-specific, SS_lib_type: F): example_BLAT_shortReadAlignmentPipeline/singleSS, as generated by the RNA-ligation strand-specific sequencing method:
========> fragment end sequenced =======================================> (transcript fragment, sense orientation)
Visit those directories and execute the runAlignments.sh to demonstrate the pipeline execution.
A coordSorted.sam, and equivalent binary .bam file is the ultimate output. If strand-specific sequencing is specified by the SS_lib_type parameter, then additional partitioning of these files according to transcribed strand is performed.
Note
|
The coordinate-sorted SAM file is compatible with the Cufflinks software for alignment-based transcript reconstruction, in additino to being used with the Inchworm Genome-Guided De novo Transcript Assembly pipeline. |