projects
Projects and Research Groups
NOTE: This list is under development and will be expanded in due course. If you (or your group) would like to be added to this list, please send us a message
People
- Kristofer Franzen
- Rolf Apweiler head of SwissProt
Groups
- BioMinT is a EU-funded Research Project (2003-2005)
- Aim: developing tools for content-based and knowledge-intensive information retrieval and extraction.
- Applications: annotation of Swiss-Prot and PRINTS proteomics databases
- Methods:
- IR: Query expansion + Ranking
- query is protein or gene name
- expand it using synonym database (using 14 different databases)
- generate and execute PubMed query
- retrieve documents, filter and rank by relevance
- Named Entity Recognition (recognition of Biological Entities), and IE
- evaluation of external tools: Yapex, KeX, GAPSCORE
- learning approaches for species classification
- plan to train a generic shallow parser over GENIA
- providing results as database slot fillers
- publications
- good 'marketing' style presentation Focus in particular on pages 27-41, which regard requirements for Text Mining
- Ranking for BioMinT , document retrieval and ranking, not particularly relevant to us
- Bio Entity Recognition learn extraction patterns for classification of organisms
- Evaluating Protein Name Recognition a comparison of two Protein Name Taggers: KeX and Yapex
- Classifying Protein Fingerprints This appears to be a purely Data Mining approach to classify Bio Data.
- BioMint short presentation by K. SeeWald
- BioNLP.org (Futrelle's page)
- Alex Morgan's HomePage with BioNLP resources
- Protein Annotation Tools (AnnBlast)
- Mining the Bibliome: Information Extraction from the Biomedical Literature (UPenn)
- Natural Language Processing and Computational Linguistics (Brandeis University)
- Semantic representation of biomedical text (Lister Hill Center, National Library of Medicine)
- Biomedical informatics (University of Sheffield)
- Helix (Inria - France)
- BioPath Project (University of Salford)
- Genia Project ( Tsujii laboratory - University of Tokyo)
- MedSyndikate Project (University of Freiburg)
- Bioinformatics (University of Arizona)
- Text Mining Group (Protein Design Group, NCB - Spain)
- Language Technology Group (University of Edinburg) - Disp Project
- BioText Project (University of California)
- Georgetown University
- BioNLP People (Kevin Cohen's page)
- Textomy The PreBind/Textomy system was presented in (donaldson:bioinformatics03). Similar to BioMinT. Meant as a database curation aid for the BIND database of protein-protein interactions. It contains IR, IE, Domain Knowledge. It uses SVM for filtering relevant documents (for protein-protein interactions). Same SVM is used to find relevant sentences. No deep linguistics. Lists of protein names and synonyms are derived from public databases, and are used as domain knowledge. Morphological and contextual rules are used to find candidate interacting proteins. Follows a step of human validation. They test their results against MIPS (an indipendent interaction database)
- TextPresso
- IR, IE and QA
- interface base on simple IR querys, or category based interface
- works on text that has been pre-annotated (how?)
- IE planned, not yet available
- not using learning (markup done manually?)
- one simple domain (C. elegans)
- Corpus of 2700 papers and 16000 abstracts
- open-source, freely available
- PASTA Result of an EPSRC project (1998-2001) Described recently in Bioinformatics
- IE system (MUC style)
- focusing on the role of amino acids residues in protein active sites
- tokenizaton, POS tagging, NE recognition, parsing, discourse interpretation, template extraction, templates are then used to fill a Relat DB