taggers

Taggers and Chunkers

Bio Taggers

  • GeneTaggerCRF (UPenn)
    • uses machine learning technique called conditionnal random fields
  • Yapex a protein name tagger (ref:franzen:ijmi02)
  • KeX freely available source codey (ref:fukuda:PSB1998)
  • AbGene simple gene finder in Medline documents
  • GAPSCORE identify names of genes and proteins
  • LingPipe generic IE tools, now applied to TREC - genomics
  • ABNER A Biomedical Named Entity Recognizer
    • See also Yagi a simpler, command-line tool
  • NLPProt NLProt is a tool for finding protein-names in natural language-text. It is based on Support Vector Machines (SVMs), which are trained on contextual-features of named entities in scientific language. Additionally, simple filtering rules and a protein-name dictionary are used to increase performance. NLProt reached a precicion (accuracy) of 70% at a recall (coverage) of 85% after running it on the 166 most recent abstracts of EMBL and Cell (Nov/Dec 2003). When run from the command line, NLProt takes about 1 second per abstract to finish.

MUC-style NE Recognizers

  • Biomedical Named Entity Recognition at A*STAR Can recognize the following classes: Virus, Tissue, RNA, Protein, Polynucleotide, Peptide, OtherOrganicCompound, OtherName, OtherArtificialSource, Organism, Nucleotide, MultiCell, MonoCell, Lipid, Inorganic, DNA, CellType, CellLine, CellComponent, Carbohydrate, BodyPart, Atom, AminoAcidMonomer.

Generic POS Taggers

Generic Chunker

Corerefence Resolution

Various tools by Patrick Ruch

  • Ruch
    • seem to be Windows only
    • for us MeSHMap might be useful

Semantic Gene Organizer

Semantic Gene Organizer (SGO) is an automated method to cluster genes based on conceptual relationships derived from MEDLINE abstracts. It uses a variant of the vector-space model called Latent Semantic Indexing (LSI) to represent genes as vectors in lower-dimension (concept) space. The relationship between genes is deduced from the cosine of the angle between gene document vectors. A gene document is a concatenation of MEDLINE titles and abstracts identified in the LocusLink entry for each gene.

Ontology Tool