SnpEff

Features, versions and roadmap

Major features planned:

  • SnpEff: Improvements in loss of function analysis
  • SnpSift: Switch to ANTLR 4.X, handle arbitrary expressions.

Features by version

Version: 4.3 (2016-09).

  • Improved support for gene fusions
  • Annotation of large structural variants

Version: 4.1 (2015-01).

  • Standard annotation format: 'ANN' INFO field
  • A better / more robust HGVS implementation
  • Variants are re-aligned to the most 3'UTR (in agreement with HGVS).

Version: 4.0 (2014-11).

  • Consistent 'help' screen when using command line option -h
  • Effects sorted canonical transcripts first (for same level of effect / impact)
  • Corrected problem on LOF annotations for gene names having spaces.

Version: 4.0 (2014-07).

  • HGVS notations (now is default)
  • Sequence Ontology terms (now by default)
  • SnpEff downloads databases automatically
  • Automatic third party databases downloads
  • Support for new genome versions (such as GRCh38 / hg38)
  • NextProt, Loss of function (LOF) and Nonsense mediated decay (MND) annotations by default
  • Improved protein coding transcript detection (when building databases)
  • Full support for MIXED variants: E.g. Some variants maybe a combination of Insertions, Deletions, SNPs or MNPs.
  • Major code refactoring
  • SnpSift annotate : Improved annotate support.
  • SnpSift dbNsfp : Several improvements on annotation methods.
  • Added support for gVCF files

Version: 3.6 (2013-05-23).

  • Improved support for MIXED variants: E.g. Some variants maybe a combination of Insertions, Deletions, SNPs or MNPs.
  • Improved HGVS notation
  • SnpSift: concordance: Calculate concordance statistics between two VCF files (e.g. a sequencing and a chip-genotyping experiment)
  • SnpSift: vcfCheck command (check VCF for seferal "common" problems)
  • Moved to Java 7. mostly due to several problems in Java 6 libraries when reading bgzip files.

Version: 3.5 (2013-03-23).

  • Improvements in cancer sample annotations
  • Added SPLICE_REGION annotation
  • SnpSift private: Annotate if a variant is "private" to a familly (or cohort)
  • SnpSift: ccs: Case / control summary statistics (of annotated files).
  • SnpSift annotate : Added tabix indexed files support. Automatic detection.

Version: 3.4 (2013-11-23).

  • Automatic database download ("-download" option)
  • Cancer samples: can be defined using a TXT file instead of VCF header.
  • Improved GenBank
  • Extended cofiguration options
  • Better frame handling for GTF/GFF files
  • Improvements in HGVS notation
  • Galaxy support: Improvements and bug fixes
  • SnpSift: Better support for dbNSFP (v2.1)

Version: 3.3 (2013-06-12).

  • Over 8,500 genomes supported.
  • All ENSEMBL (version 18) : Bacteria, Fungi, Metazoa, Plants and Protist genomes added.
  • NextProt annotaions added
  • Motif annotations support added
  • SnpSift: GeneSet annotations
  • SnpEff count: Genomic region statistics counting reads, variants, intervals, etc.

Version: 3.2 (2013-14-01).

  • Cancer variants analysis
  • GATK compatibile (-o gatk)
  • HGVS notations support

Version: 3.1 (2012-11-02).

  • All NCBI bacterial genomes added: Over 2,500 genomes added!
  • Loss of function effect and tag added (experimental command line option '-lof')
  • Nonsense-mediated decay effect and tag added (experimental command line option '-lof')
  • ENSEMBL version 68 genomes added
  • SnpEff 'countReads' count number of reads and bases (form a BAM file) on each gene, transcript, exon, intron, etc.
  • SnpEff Intron and Intergenic annotations improved.

Version: 3.0, revision 'f' (2012-08-23).

  • GATK output format compatibility option: '-o gatk'
  • Fixed problem when parsing comment after GFF headers.
  • Added GENCODE tags for GTF parsing
  • Splice site analysis tools
  • Analysis of U12 branch sites.
  • Minor problems caused by empty VCF headers solved.
  • Fixed bug in calculation of degenerate sites.
  • Fixed problem in canonical transcripts.
  • Plasmodium falciparum hand curated versions (by Daniel Park, Broad): Pf3D7v72 and Pf3D7v90
  • Maven project, created by Louis Letourneau.
  • Project source code changed to SVN (Louis Letourneau).
  • Databases will be 'backwards compatible' from now on.
  • New format for VCF files: added CDS length in amino acid (AA_LEN field).
  • Canonical transcript filter (command line option "-canon").
  • Improved GenBank parsing.
  • SnpSift 'dbnsfp': Annotate using dbNSFP (Louis Letourneau).
  • SnpSift 'gwasCat': Added GWA catalog annotations.
  • SnpSift 'extractFields': extract fields to TXT files (tab separated)
  • SnpSift 'sift': Annotate using SIFT database.
  • SnpSift 'Annotate' and 'AnnMem': Now support to add all fields in a VCF file for annotations.

Version: 2.1b (2012-04-26).

  • Revision "2.1c" : Maven project (by Louis Letourneau)
  • Revision "2.1c" : Improved Galaxy wrappers (by Peter briggs)
  • Revision "2.1b" : Improved RefSeq parsing
  • Revision "2.1a" : Multi-thred race condition solved.
  • Note If you are using hg19, it is recommended to download the latest database (due to improved RefSeq parsing in 2.1b).
  • Added multi-threaded support (command line option '-t').
  • GenBank support for building databases. See details here .
  • Config file siplified
  • E.Coli database added
  • Galaxy download database option added.
  • Added all ENSEMBL version 66 genomes
  • Database 'download' issue solved. Apparently SourceForge servers were choking on URL that had double slashes, this should not happen. Implemented a workarround.
  • SnpSift GWAS catalog: Annotate using GWAS Cataloga.
  • SnpSift: Added 'varType' to annotate variant type (SNP/MNP/INS/DEL), as well as HOM/HET if possible.
  • Faster VCF processing.

Version: 2.0.5 (2011-11-25).

  • Support for RARE amino acids (see details here)
  • Database for Soybean (Glycine max) added

Version: 2.0.5 (2011-11-25).

  • Database download command, e.g. "java -jar snpEff.jar download GRCH37.64"
  • Added all ENSEMBL version 65 genomes
  • RefSeq annotations support added.
  • Rogue transcript filter: By default SnpEff filters out some suspicious transcripts from annotations databases. This should improve false positive rates.
  • Amino acid changes in HGVS style (VCF output)
  • Optimized parsing for VCF files with large number of samples (genotypes).
  • Option to suppress summary calculation ('-noStats'), can speed up processing considerably in cases where VCF files have hundreds or thousands of genotype fields.
  • Option '-onlyCoding' is set to 'auto' to reduce number of false positives (see next).
  • Option '-onlyCoding' can be assigne a value: If value is 'true', report only 'protein_coding' transcripts as proteing coding changes. If 'false', report all transcript as if they were conding. Default: Auto, i.e. if transcripts any marked as 'protein_coding' the set it to 'true', if no transcripts are marked as 'protein_coding' then set it to 'false'.
  • Added BED output format. This is usefull to annotate the output of a Chip-Seq experiment (e.g. after performing peak calling with MACS, you want to know where the peaks hit).
  • Added BED Annotation output format. This is usefull to get all annotation intervals that intersect a set of variants (or genomic regions).
  • SnpSift filter:
    • Added generic index ('*') for variables, genotypes and effects. E.g.: ( 'GEN[*].GT = '1|1' )
    • Added support for 'EFF' and subfields (from SnpEff processed files). E.g.: ( EFF[*].EFFECT = 'NON_SYNONYMOUS_CODING' )
  • SnpSift intidx: Designed to extract a small number of intervals from huge VCF files. Added indexing using memory mapped I/O files for retrieving intervals from huge VCF files. Works really fast!

Version: 2.0.3 (2011-10-08)

  • Functional classes added in VCF output (i.e. NONE, SILENT, MISSENSE, NONSENSE)
  • Added MODIFIER effect 'impact'.
  • Rice genome added.
  • Added all ENSEMBL version 64 genomes.
  • Several minor issues solved.
  • Report usage statistics to server (can be disabled using '-noLog' options).

Version: 2.0.2 (2011-09-09)

  • VCF output format
  • GATK integration. Now you can use SnpEff from GATK's VariantAnnotator.
  • Default input file is STDIN. I.e. inputFile parameter can be ommited now.
  • Gene list outputs to a TXT file (tab separated) instead of the summary (HTML) file.
  • Command line format changed for various options
  • Option '-sort' deprecated.

Version: 1.9.6 (2011-08-08)

  • Ensembl genomes v63 added.
  • Warning! Genome names changed to agree with Ensembl naming convention, here are the names:
    Full name Short name
    Ailuropoda_melanoleuca ailMel1.63
    Anolis_carolinensis AnoCar2.0.63
    Bos_taurus Btau_4.0.63
    Caenorhabditis_elegans WS220.63
    Callithrix_jacchus C_jacchus3.2.1.63
    Canis_familiaris BROADD2.63
    Cavia_porcellus cavPor3.63
    Choloepus_hoffmanni choHof1.63
    Ciona_intestinalis JGI2.63
    Ciona_savignyi CSAV2.0.63
    Danio_rerio Zv9.63
    Dasypus_novemcinctus dasNov2.63
    Dipodomys_ordii dipOrd1.63
    Drosophila_melanogaster BDGP5.25.63
    Echinops_telfairi TENREC.63
    Equus_caballus EquCab2.63
    Erinaceus_europaeus HEDGEHOG.63
    Felis_catus CAT.63
    Gallus_gallus WASHUC2.63
    Gasterosteus_aculeatus BROADS1.63
    Gorilla_gorilla gorGor3.63
    Homo_sapiens GRCh37.63
    Loxodonta_africana loxAfr3.63
    Macaca_mulatta MMUL_1.63
    Macropus_eugenii Meug_1.0.63
    Meleagris_gallopavo UMD2.63
    Microcebus_murinus micMur1.63
    Monodelphis_domestica BROADO5.63
    Mus_musculus NCBIM37.63
    Myotis_lucifugus Myoluc2.0.63
    Nomascus_leucogenys Nleu1.0.63
    Ochotona_princeps pika.63
    Ornithorhynchus_anatinus OANA5.63
    Oryctolagus_cuniculus oryCun2.63
    Oryzias_latipes MEDAKA1.63
    Otolemur_garnettii BUSHBABY1.63
    Pan_troglodytes CHIMP2.1.63
    Pongo_abelii PPYG2.63
    Procavia_capensis proCap1.63
    Pteropus_vampyrus pteVam1.63
    Rattus_norvegicus RGSC3.4.63
    Saccharomyces_cerevisiae EF3.63
    Sorex_araneus COMMON_SHREW1.63
    Spermophilus_tridecemlineatus SQUIRREL.63
    Sus_scrofa Sscrofa9.63
    Taeniopygia_guttata taeGut3.2.4.63
    Takifugu_rubripes FUGU4.63
    Tarsius_syrichta tarSyr1.63
    Tetraodon_nigroviridis TETRAODON8.63
    Tupaia_belangeri TREESHREW.63
    Tursiops_truncatus turTru1.63
    Vicugna_pacos vicPac1.63
    Xenopus_tropicalis JGI_4.2.63
  • Problems with VCF heterozygous: Fixed
  • Problems parsing some InDels: Fixed
  • Error conditions on deletion at the border between UTR and Exon: Fixed
  • Problems reporting some CDS relative positions: Fixed
  • Some issues related to distance calculation on Downstream genes on negative strands: Fixed

Version: 1.9.5 (2011-03-10)

  • Variants per gene table.
  • Improvements in summary report.
  • Improved GFF3 parsing.
  • Several genomes added.

Version: 1.9 (2011-03-10)

Features recently added:
  • Improved command line
  • Genomes added (Arabidopsis) : alyrata107, athaliana130
  • Genomes added (all ENSEMBL version 61):
     ailmel1.61, anoCar2.0.61, btau4.0.61, bushBaby1.61, calJac3.2.1.61, canFam2.61, 
     cat1.61, cavPor3.61, ce.WS220.61, chimp2.1.61, choHof1.61, cInt2.61, cSav2.0.61, 
     danRer9.61, dasNov2.61, dipOrd1.61, dm5.25.61, equCab2.61, eriEur1.61, fugu4.61, 
     gacu1.61, ggallus2.61, gorGor3.61, hg37.61, loxAfr3.61, medaka1.61, meug1.0.61, 
     micMur1.61, mm37.61, mmul1.61, monDom5.61, myoLuc1.61, oana5.61, ochPri2.61, 
     oryCun2.61, ppyg2.61, proCap1.61, pteVam1.61, rat3.4.61, sacCer2.61, sorAra1.61, 
     speTri1.61, sScrofa9.61, taeGut3.2.4.61, tarSyr1.61, tenrec1.61, tetraodon8.61, 
     tupBel1.61, turkey.UMD2.61, turTru1.61, vicPac1.61, xtrop4.1.61 
  • Genomes added (Flybase): dm5.34
  • Genomes added (legacy hg18): hg36.54
  • Improved summary and statistics
  • Supports BED format: if you just need to check where an interval hits (e.g. exon, intron, genes, etc.)
  • Added support for GTF 2.2 format
  • Improved robustnes of GFF3 and GFF2 parsing
  • Improved splice site detection: SPLICE_SITE_DONOR and SPLICE_SITE_ACCEPTOR
  • Improved support for insertions and deletions: CODON_INSERTION, CODON_CHANGE_PLUS_CODON_INSERTION, CODON_DELETION, CODON_CHANGE_PLUS_CODON_DELETION
  • Improved support for large deletions: EXON_DELETED and UTR_DELETED
  • Added suport for INTRON_CONSERVED and INTERGENIC_CONSERVED intervals (available in GTF 2.2 files)
  • Added support for ambiguous sequences in exons (e.g. sequences that have "N")
  • Database dump support:
    java -jar snpEff.jar dump genome_version
  • CDS testing support:
    java -jar snpEff.jar cds genome_version cds.fasta

Older features

  • Show DNA and amino acid sequence before and after change: option "-a, --around", e.g. "-a 5" shows 5 codons around sequence change)
  • WARNING: Since version 1.7 snpEff assumes one-based coordinates (i.e. option "-1" is the default instead of "-0")
  • WARNING: Since version 1.7 snpEff does not sort sequence changes. You should use option "-sort" if you want that.
  • Genomes added (Pseudomonas): Pseudomonas aeruginosa (paeru.PA01 and paeru.PA14) and Pseudomonas fluorescens (pfluo.SBW25.NC_009444 and pfluo.SBW25.NC_012660)
  • Genomes supported (all ENSEMBL version 60):
     ailMel1.60, amel2, anoCar1.0.60, btau4.0.59, btau4.0.60, bushBaby1.60,
     calJac3.2.1.60, canFam2.59, canFam2.60, cat1.60, cavPor3.60, ce6, ce.WS210.60,
     chimp2.1.59, chimp2.1.60, choHof1.60, cInt2.60, cSav2.0.60, danRer6, danRer8.59,
     danRer9.60, dasNov2.60, dipOrd1.60, dm3, dm5.12, dm5.22, dm5.25.59, dm5.25.60,
     dm5.30, dm5.31, equCab2.60, eriEur1.60, fugu4.60, gacu1.60, ggallus2.59, ggallus2.60,
     gorGor3.60, hg37, hg37.59, hg37.60, loxAfr3.60, medaka1.60, meug1.0.60, micMur1.60,
     mm37, mm37.59, mm37.60, mmul1.60, monDom5.60, myoLuc1.60, oana5.60, ochPri2.60,
     oryCun2.60, ppyg2.60, proCap1.60, pteVam1.60, rat3.4.59, rat3.4.60, sacCer2,
     sacCer2.59, sacCer2.60, SIVmac239, sorAra1.60, speTri1.60, sScrofa9.60,
     taeGut3.2.4.60, tarSyr1.60, tenrec1.60, testCase, tetraodon8.60, tupBel1.60,
     turTru1.60, vicPac1.60, xtrop4.1.60 
  • VCF4 input format is now supported
  • Support new genome Apis Mellifera
  • Statistics and plots
  • Filter intervals (only analyze selected intervals)
  • Supports pileup format (SamTools) as input format.
  • One-based and zero-based positions for input and output (as well arbitrary offsets)
  • Support for heterozygous SNPs (e.g. A/W)
  • Predicts insertions and deletions (FRAME_SHIFT)
  • Supports GFF format when building databases.
  • Added: Multiple nucleotide polymorphisms (MNPs)
  • New format shows SNP quality and coverage.
  • Can filter SNPs, InDels and MNPs based on quality, coverage and zygosity (Hom/Het).