You need a default installation of Python 2.7 with virtualenv .
# create and activate new python virtual environment for scrimer
# in home directory of current user
# install cython in advance because of pybedtools
# and distribute because of pyvcf
pip install cython distribute
# now install scrimer from pypi
# with it's additional dependencies (pyvcf, pysam, pybedtools)
pip install scrimer
Scrimer depends on several python modules, that should be installed automatically using the above procedrue.
- pysam is used to manipulate the indexed fasta and bam files
- pybedtools is used to read and write the annotations
- PyVCF is used to access variants data
If you’re in an environment where you’re not able to install virtualenv systemwide, we recommend using
the technique described at http://eli.thegreenplace.net/2013/04/20/bootstrapping-virtualenv/.
If you’re in a grid environment, this can help with paths that differ on different nodes
virtualnev --relocatable ~/scrimer-env
Apart from the Python modules, the Scrimer pipeline relies on other tools that should be installed
in your PATH. Follow the installation instructions in each package.
For reference we recorded the commands used to install those dependencies in
the scrimer virtual box image. If your system is Debian 7, the commands could work verbatim.
- bedtools is a dependency of pybedtools, used for manipulating with gff and bed files
- samtools is used for manipulating short read alignments, and for calling variants
- LASTZ is used to find the longest isotigs
- tabix creates compressed and indexed verisions of annotation files
- GMAP produces a spliced mapping of your contigs to the reference genome
- smalt maps short reads to consensus contigs to discover variants
- GNU parallel is used throughout the pipeline to speed up some lengthy calculations
- blat and isPcr are used to check the designed primers
- Primer3 is used to find the most optimal primes sequences
- cutadapt is used to remove cDNA synthesis primers.
Additional tools can be installed to provide some more options.
- FastQC can be used to check the tag cleaning process
- agrep and tre-agrep can be used to check the tag cleaning process
- sort-alt provides alphanumeric sorting of chromosome names, rename
sort-alt after compiling
- IGV is great for visualizing the data when checking the results
- newbler is the best option for assembling 454 mRNA data
- MIRA does well with 454 transcriptome assembly as well
- sim4db can be used as alternative spliced mapper,
part of the kmer suite, apply our patch to get standard conformant output
- Pipe Viewer can be used to display the progress of longer operations
- BioPython and NumPy are required for running
- mawk , awk is often used in the pipeline, and mawk is usually an order of magnitude faster
- vcflib has a nice interface for working with vcf files (but new bcftools are good as well)