% $INCHWORM_HOME/bin/run_BLAT_shortReadPipeline.pl
Included in the Inchworm software distribution is a pipeline that aligns RNA-Seq reads to the genome using BLAT, ultimately producing SAM (and BAM) files, which can be used below for genome-guided Inchworm assembly. This alignment pipeline excels for genomes containing genes with short introns (plants, fungi, protozoa, etc). For long-intron-containing genomes, TopHat is recommended.
In addition to the Inchworm software, to use this pipeline you must install:
Both blat and samtools (particularly the psl2sam.pl script included in samtools) need to be available within your PATH setting.
Once the above tools are installed, the BLAT short-read alignment pipeline can be run as follows, starting from FASTA or FASTQ files, and single or paired-reads:
% $INCHWORM_HOME/bin/run_BLAT_shortReadPipeline.pl
################################################################################################################
#
# --left and --right (if paired reads)
# or
# --single (if unpaired reads)
#
# Required inputs:
#
# --genome multi-fasta file containing the genome sequences (should be named {refName}.fa )
#
# --seqType fa | fq (fastA or fastQ format)
#
# Optional:
#
# --SS_lib_type strand-specific library type: single: F or R paired: FR or RF
# examples: single RNA-Ligation method: F
# single dUTP method: R
# paired dUTP method: RF
#
# -I maximum intron length (default: 10000);
#
# -o output directory
#
# --trim_short_terminal_segments (trim off short terminal alignment segments that are mostly noise. Default: 10)
#
# -P min percent identity based on full sequence length (default: 95)
#
# --blat_top_hits (default: 20 in paired mode, 1 in single mode)
#
# -C final top hits reported (default: 1) (only applies to paired mode)
#
# If paired mode:
#
# --max_dist_between_pairs default (2000)
#
####################################################################################################################
Example data sets described below can be downloaded here as BLAT_short_read_alignment_pipeline-(datestamp).tgz.
Example data and pipeline execution are provided for:
paired reads (strand-specific, SS_lib_type: RF): example_BLAT_shortReadAlignmentPipeline/pairedSS. The strand-specific library type (SS_lib_type) of RF corresponds to the following, which results from the dUTP-based strand-specific sequencing method:
========> /2 (right of sequenced fragment)
=======================================> (transcript fragment, sense orientation)
<============ /1 (left of sequenced fragment)
single (unpaired) reads (strand-specific, SS_lib_type: F): example_BLAT_shortReadAlignmentPipeline/singleSS, as generated by the RNA-ligation strand-specific sequencing method:
========> fragment end sequenced =======================================> (transcript fragment, sense orientation)
Visit those directories and execute the runAlignments.sh to demonstrate the pipeline execution.
A coordSorted.sam, and equivalent binary .bam file is the ultimate output. If strand-specific sequencing is specified by the SS_lib_type parameter, then additional partitioning of these files according to transcribed strand is performed.
|
Note
|
The coordinate-sorted SAM file is compatible with the Cufflinks software for alignment-based transcript reconstruction, in additino to being used with the Inchworm Genome-Guided De novo Transcript Assembly pipeline. |