Difference between revisions of "GSA Markup SOP"

From WormBaseWiki
Jump to navigationJump to search
m
Line 16: Line 16:
 
----
 
----
 
==Entity list sources==
 
==Entity list sources==
Gene list -> ace server
+
Gene list -> ace server <br>
Variation list ->ws current.obo -> latest dev WS  
+
Variation list ->ws current.obo -> latest dev WS <br>
Transgene list ->wscurrent.obo -> latest dev WS  
+
Transgene list ->wscurrent.obo -> latest dev WS <br>
  
Anatomy/cell list
+
Anatomy/cell list<br>
Person list -> latest WS (dev or public?)
+
Person list -> latest WS (dev or public?)<br>
  
 
----
 
----

Revision as of 16:22, 26 January 2010

Caltech_documentation

GSA linking pipeline

from Arun's 2010 SAB Talk

SOP for linking

We are given 48 hours to markup and return the XML document from Genetics. If it arrives on a Friday, we will have the weekend.

At Caltech, the XML is scanned by Arun's script, which identifies and marks up all know objects by exact match and formatting rules. All of the objects; genes, variations, transgenes, rearrangements, are mined from servers rather than from a WB release so that they are as up-to-date as possible, such that genes that are not on the website but have been approved will be recognized. Arun and I will go through the markup to make sure the links are correct and no objects are missed. Objects that are missed are added to Arun's object lists through the author first-pass form, so in practice, curators may receive objects from the author first pass form from either the author, me, or both.

After correcting the object lists, we run the paper through again before a second in-house proof, then send it back to DJS. It takes 20-30 minutes to go through a paper with a lot of genes.

First, a general issue: gene names that don't exist in WB yet. The paper had a couple gene names, "cbr-dpy-1" and "cpy-4", both were referring to the same briggsae sequence, so our solution was to make a link to the WBGene page rather than create a link with cbr-dpy-1 or cpy-4 as the public name, as the other links are made. I think this is a fair way to deal with all non existent gene names in the future, but it can't be automated, and it requires that the authors are explicit with which sequence they are naming. Authors are asked to declare these objects before the linking scripts are run, you receive these alerts, but authors do not always follow through. If an author is not explicit, the best option is for us to contact them, if we do not get a response within a reasonable time, no link will be created. Please let us know if and how you want to be part of this pipeline.


Entity list sources

Gene list -> ace server
Variation list ->ws current.obo -> latest dev WS
Transgene list ->wscurrent.obo -> latest dev WS

Anatomy/cell list
Person list -> latest WS (dev or public?)


People

Caltech: Arun Rangarajan, Karen Yook, Juancarlos Chan, Paul Sternberg, Hans-Michael Müller GSA: Tim Schedl, Tracey DePellegrin-Connelly, Ruth Isaacson DJS (Dartmouth Journal Services): Stephen Haenel, Sharon Faelten, Lolly Otis