Design primers¶

Design primers with Primer3¶

Set inputs and outputs for this step:

VARIANTS=50-variants/demo-variants-selected.vcf.gz
SCAFFOLD=33-scaffold/sc-demo.fasta
ANNOTS=33-scaffold/sc-demo.sorted.gff3.gz

GFFILE=demo-primers.gff3
OUT=60-gff-primers

GFF=$OUT/$GFFILE
PRIMERS=${GFF%.*}.sorted.gff3.gz
mkdir -p $OUT

# for all selected variants design pcr and genotyping primers
# takes about a minute for 1000 selected variants, 5 MB gzipped vcf, 26 MB uncompressed genome, 5 MB gzipped gff
# default values are set for SNaPshot
export PRIMER3_CONFIG=/opt/primer3/primer3_config/
design_primers.py $SCAFFOLD $ANNOTS $VARIANTS > $GFF

# use --primer-pref to set preferred length of genotyping primer
# this is useful for other genotyping methods, like MALDI-TOF
design_primers.py --primer-pref 15 --primer-max 25 $SCAFFOLD $ANNOTS $VARIANTS > $GFF

Sort and index the annotation before using it in IGV. For a small set of primers it is not necessary to compress and index the file, IGV can handle raw files as well.

sortBed -i $GFF | bgzip > $PRIMERS
tabix -f -p gff $PRIMERS

Create a region list for IGV to quickly inspect all the primers.

<$GFF awk 'BEGIN{OFS="\t";} /pcr-product/ {match($9, "ID=[^;]+"); print $1, $4, $5, substr($9, RSTART+3, RLENGTH);}' > ${GFF%.*}.bed

Convert scaffold to the blat format¶

SCAFFOLD2BIT=$OUT/${SCAFFOLD##*/}.2bit
faToTwoBit $SCAFFOLD $SCAFFOLD2BIT

Validate primers with blat/isPcr¶

Recommended parameters for PCR primers in blat [1]: -tileSize=11, -stepSize=5

Get the primer sequences, in formats for isPcr and blat:

PRIMERS_BASE=${PRIMERS%%.*}
extract_primers.py $PRIMERS isPcr > $PRIMERS_BASE.isPcr
extract_primers.py $PRIMERS > $PRIMERS_BASE.fa

Check against transcriptome data and the reference genome:

# select one of those a time:
TARGET=$SCAFFOLD2BIT
TARGET=$GENOME2BIT

TARGET_TAG=$( basename ${TARGET%%.*} )
isPcr -out=psl $TARGET $PRIMERS_BASE.isPcr $PRIMERS_BASE.isPcr.$TARGET_TAG.psl
blat -minScore=15 -tileSize=11 -stepSize=5 -maxIntron=0 $TARGET $PRIMERS_BASE.fa $PRIMERS_BASE.$TARGET_TAG.psl

Note

TODO: It would be nice to add the annotations found by isPcr to the primer gff3 tags (not implemented yet).

Check the results¶

Count and check all places where primer3 reported problems:

<$GFF grep gt-primer | grep -c 'PROBLEMS='
<$GFF grep gt-primer | grep 'PROBLEMS=' | less -S

# count unique variants with available primer set
<$GFF grep gt-primer|grep -v PROBLEM|egrep -o 'ID=[^;]+'|cut -c-13|sort -u|wc -l

Use agrep to find similar sequences in the transcript scaffold, to check if the sensitivity settings of blat are OK. Line wrapping in fasta can lead to false negatives, but at least some primers should yield hits:

# agrep is quite enough for simple checks on assemblies of this size (30 MB)
SEQ=GCACATTTCATGGTCTCCAA
agrep $SEQ $SCAFFOLD|grep $SEQ

Import your primers to any spreadsheet program with some selected information on each primer. Use copy and paste, the file format is tab separated values. When there is more than one genotyping primer for one PCR product, the information on the PCR product is repeated.

extract_primers.py $PRIMERS table > $PRIMERS_BASE.tsv

[1]	http://genomewiki.ucsc.edu/index.php/Blat-FAQ