GO entity markup
This project will be carried out in parallel with the normal markup pipeline with the following criteria:
- markup is done on a separate textpresso machine -dev.textpresso.org (production machine is textpresso-dev.caltech.edu)
- the papers will not be sent to DJS (there is no upload form set for this pipeline)
- only the GO lexicon from AmiGo will be used
- papers marked up will be papers that come through the normal pipeline (no retroactive markup will occur for now)
- GO marked up papers will be sent to the GO linking crew: Kimberly, Ranjana, Daniela, Chris, Karen, and Paul
- as with the normal pipeline a link to an entity table of all entities, the generated URL, and a brief description of the webpage will be included in the alert e-mail (see below)
- all comments for the papers will be made available through this wiki.
- all links are formed for WB GO pages, not AmiGo pages
e-mail alert from Arun
Once a paper has been received and run through the GO linking script on dev.textpresso.org, an e-mail message will be sent out to everyone on the GO linking crew. This message includes a link to the paper and to the entity table.
Date: Wed, Jul 13, 2011 at 10:14 AM
Subject: GSA auto-email: GSA 128421 linked file available
This is an automatic email sent to you by the GSA pipeline.
ATTENTION: This is not the production file. This is only for testing GO term linking.
Responsible curator: Daniela Raciti
Linked file available for manual QC at
The entity table for this first pass/automatically linked article is available at
- Some GO terms do not have pages and WB displays a page with title 'Gene Ontology Search' for these URLs. See
- these problem links have been color-coded in 'grey' in the entity table. The URL is live, but the page has no relevant content.
GO links seem to fall into three categories:
- Those that are correct.
- Those that are incorrect.
- Those that aren't necessarily wrong, but either don't quite capture the essence of the entity being discussed in the paper, or are cases where maybe the GO term isn't the best or most informative link to make. This happens, for example, when the linking matches a phrase that is part of a larger concept to a GO term. Some examples of these:
E2F transcription factors (linked to transcription, DNA-dependent) acetylcholine receptor agonist levamisole (linked to acetylcholine receptor activity) cell death gene (linked to cell death) rab-2 locomotion phenotype (linked to locomotion)
What can we realistically address by manual editing, and how much time would it take? Is it worth the time?
Would there be consistency issues to resolve?
Would it be better to link the overall article to GO terms, rather than link from specific terms in the paper? GO terms with a minimum number of links would be attached to the paper? Is there a way to do this on the Genetics site?
Should we just link from certain sections, e.g. the abstract?
What options do we currently have for viewing links? Can users select what types of links they want to see, e.g. what branch of the ontology or string-matches vs curated links?
What role could community annotation play here?
The ontology could certainly add more synonyms, add plurals, etc. What's the most efficient way to do this?
Ranjana --One question I had was--would we link all text matches, or only link GO terms relevant to the gene/processes being studied in the paper? --Would we do materials and methods, discussion, etc? --I looked at only one paper, but could immediately make-out that this is a time-intensive effort especially if we want to do a good quality job. --I feel the cost-benefit might be better if we did only abstracts, we could see how it goes once in production and take it from there. --With GO terms you start getting into areas of annotation, and you don;t want to mislead the user, thats what makes it time intensive.
Click on the associated links to see the various pages documenting the GO linking of that paper
doi10.1534/genetics.111.128421 00038399 | GO_linked_html | GO_entity_list | WBPaper00038399_GO_linking_comments
doi10.1534/genetics.111.130450 00038523 | GO_linked_html | GO_entity_list | WBPaper00038523_GO_linking_comments
doi10.1534/genetics.111.131227 00038528 | GO_linked_html | GO_entity_list | WBPaper00038528_GO_linking_comments
doi10.1534/genetics.111.131714 00039858 expected