APPRIS analysis is based on a range of complementary computational methods. The methods in APPRIS are the following:

  1. Functionally important residues, firestar
  2. Protein structural information, Matador3D
  3. Presence of whole protein domains, SPADE
  4. Conservation against vertebrates, CORSAIR
  5. Presence of whole trans-membrane helices, THUMP
  6. Prediction of signal peptide and sub-cellular location, CRASH
  7. Selection of primary variant, APPRIS

Firestar: Functionally Important Residues

firestar predicts functionally important residues based on the fireDB database. The predictions are based on the local evaluation of alignments between the query sequence and the structures with functional information that are stored in FireDB. The reliability of predictions is assessed with SQUARE and the functional information is highlighted along with a reliability score.

Functional residues are highly conserved, even across large evolutionary distances. Since these residues are unlikely to have arisen by chance we can also use this to help determine the principal isoform. Variants that have "lost" conserved functional residues are unlikely to be the principal isoform.

Matador3D: Protein structural information

Protein structural information is analysed with Matador3D. In practice variant sequences from the same gene are mapped onto 3D structures by running BLAST against the PDB.

Protein structure is much more conserved than sequence and isoforms with large inserts or deletions relative to homologous crystal structures are also not likely to be principal.

SPADE: Scanning Pfam Alignments for Damaged Entities

Proteins are generally comprised of one or more functional regions commonly termed domains. Identifying the functional domains present in a variant can provide insights into the function and to determine the most likely principal isoform.

The presence of functional domains is analysed with Pfamscan.

Variants that have "lost" conserved functional domains are unlikely to be principal isoforms.

CORSAIR: Conservation against vertebrates

CORSAIR carries out BLAST searches against vertebrates to determine the most likely principal isoform.

Transcripts that are conserved over greater evolutionary distances are more likely to be the main variant.

CORSAIR counts the number of species that align correctly and without gaps for each variant.

THUMP: Detecting reliable trans-membrane helices

THUMP makes unanimous predictions of trans-membrane helices using three different methods: MEMSAT 3.0, Phobius, and PRODIV. A helix has to be predicted by all three methods to be considered reliable.

Transcripts that have "lost" trans-membrane helices are less likely to be the principal isoform.

CRASH: Signal Peptide and Mitochondrial Signal Sequences.

The presence and location of signal peptides and cleavage sites in amino acid sequences are analysed with SignalP program. And TargetP predicts the sub-cellular location of eukaryotic proteins. CRASH uses a rule-based analysis of these two programs to select only reliable predictions.