Difference between revisions of "Transgene curation pipeline"

From WormBaseWiki
Jump to navigationJump to search
m (moved Curate with Phenote to Transgene curation pipeline: Phenote served as a short-lived curation tool and was replaced by an OA.)
m
Line 89: Line 89:
 
* SQL-use?
 
* SQL-use?
  
==Transfer from phenote to OA==
+
==Changes implemented in Phenote form to make the transgene OA==
 
*Autocomplete
 
*Autocomplete
 
*Multi-ontology
 
*Multi-ontology
Line 100: Line 100:
 
=====Tab 1 (Transgene)=====
 
=====Tab 1 (Transgene)=====
 
*'''Pgdbid'''->''keep'' - postgres database ID,  autogenerated -> no change
 
*'''Pgdbid'''->''keep'' - postgres database ID,  autogenerated -> no change
 
*'''Curator'''
 
  
 
*'''Name'''->''keep'' free text ->no change
 
*'''Name'''->''keep'' free text ->no change
Line 117: Line 115:
 
*'''Gene'''->''keep-> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
 
*'''Gene'''->''keep-> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
  
*'''Rescues''' ->NEW field -> multi-ontology WBvariation
+
*'''3'UTR'''-> multi-ontology, added by Chris 5/24/12
  
*'''Reference'''->''keep''-> change to multi-ontology paper, possible to make box expandable?
+
*'''Rescues''' ->NEW field -> multi-ontology WBvariation ->Changed model and deleted this tag.
  
=====Tab 2 (Isolation)=====
+
*'''Coinjection marker''' ->''NEW field'' text, not dumped
  
 +
*'''Reporter type''' -> drop down list values of transcriptional or translational reporter
  
 
*'''Remark'''->''keep''-> Big text, pipe separated values ->no change  
 
*'''Remark'''->''keep''-> Big text, pipe separated values ->no change  
 +
 +
=====Tab 2 (Isolation)=====
  
 
*'''Clone/plasmid'''->''NEW Field''-> multi -ontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone  where a->type = "Plasmid" (remove all "sjj_" clones).  also see e-mail. (for now this will be a no dump field).
 
*'''Clone/plasmid'''->''NEW Field''-> multi -ontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone  where a->type = "Plasmid" (remove all "sjj_" clones).  also see e-mail. (for now this will be a no dump field).
  
*'''Coinjection marker''' ->''NEW field'' text
+
*'''Integration method'''->''keep'' -multi-drop down list -> no change. used to be called integrated by
 
 
*'''Integrated by'''->''keep'' -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.
 
  
 
*'''Map'''->''keep''-> multi-drop down list -> no change
 
*'''Map'''->''keep''-> multi-drop down list -> no change
Line 138: Line 137:
 
*'''Map Person'''->''keep''->multi-ontology person
 
*'''Map Person'''->''keep''->multi-ontology person
  
*'''Location'''->''keep''-> multi-ontology laboratory
+
*'''Laboratory'''->''keep''-> multi-ontology laboratory, used be be called Location
  
 
*'''Strain'''->''keep'' ->free text, pipe separated values ->no change
 
*'''Strain'''->''keep'' ->free text, pipe separated values ->no change
  
 
=====Tab 3 (Expression)=====
 
=====Tab 3 (Expression)=====
 +
*'''Curator'''
 +
 +
*'''Paper'''->''keep''-> change to multi-ontology paper, possible to make box expandable?
 +
 +
*'''Person'''->multiontology
 +
 
*'''Marker for'''->''keep''->free text->no change
 
*'''Marker for'''->''keep''->free text->no change
  
Line 151: Line 156:
 
*'''Driven by Construct'''->''keep''-> free text->no change
 
*'''Driven by Construct'''->''keep''-> free text->no change
  
*'''Movie'''-> ''keep''->no change
+
*'''Movie'''-> removed
  
*'''Picture'''-> ''keep''->no change
+
*'''Picture'''-> removed
  
 +
*'''Search New Transgene'''-> ''keep''-> no change
 +
 +
*'''Fail''' for marking transgenes that are falsely attributed to a given paper.
 
=====Tab 4 (Postgres)=====
 
=====Tab 4 (Postgres)=====
*'''Search New Transgene'''-> ''keep''-> no change
 
  
 
*'''SQL'''-> remove
 
*'''SQL'''-> remove
 +
--[[User:Kyook|kjy]] 00:10, 26 May 2012 (UTC)
  
 
=====Way in the future=====
 
=====Way in the future=====
 
  
 
*'''Genomic Expression'''->''New field'' - for developing a standardizing transgene expression nomenclature,  eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)
 
*'''Genomic Expression'''->''New field'' - for developing a standardizing transgene expression nomenclature,  eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)

Revision as of 00:10, 26 May 2012

back to Transgenes back to Caltech documentation

Importing transgenes from textpresso

every day at 4am :

/home/postgres/work/pgpopulation/textpresso/wrapper.sh

call :

/home/postgres/work/pgpopulation/textpresso/transgene/update_textpresso_transgene.pl
/home/postgres/work/pgpopulation/textpresso/antibody/update_textpresso_antibody.pl
/home/postgres/work/pgpopulation/afp_papers/find_passwd_@.pl
/home/postgres/public_html/cgi-bin/data/ccc_gocuration/get_newset.pl

Relevant script now :

/home/postgres/work/pgpopulation/textpresso/transgene/update_textpresso_transgene.pl

gets data from

http://textpresso-dev.caltech.edu/transgene/transgenes_in_regular_papers.out

--kjy 19:35, 24 May 2012 (UTC)

Curating transgenes

Invoke the phenote transgene configuration interface and access postres

go to directory with phenote
 $./phenote -c worm-transgene.cfg

If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since it was entered from the Textpresso search.

Curators should look for information of new transgenes in the paper document provided by Textpresso (main paper or supplementary file).

Sometimes papers do not provide any information on the transgenes, only the name is provided. Then "No transgene info in original publication." should be entered into the Remark field so that it will not be identified as a new transgene again.

Here is the controlled vocabulary for the transgene remark field:

  • Remark "Conflicting mapping info: ..."
  • Remark "Conflicting genotype: ..."
  • Remark "No transgene info in original publication."
  • Remark "Other integration method: ..."
  • Remark "Clone = "
  • Remark "Mapping info: "

Phenote transgene.cfg

T= free text; M= multiple values, separate values with a pipe(|); S= selection list

Tab 1

  • Pgdbid- postgres database ID, entered automatically when curator enters a new transgene
  • Name(T)-approved name following Lab-prefix (or WBPaperID), Is or Ex, number
  • Summary(T)- genotype, including co-injection marker and relevant information about making the construct, if papers rport conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
  • Driven by Gene(T,M)- enter WBGeneID used for promoters in every promoter driven construct of the transgene
  • Reporter Product(S,M)- list has common reporter genes, GFP, RFP, LacZ, etc.
  • Other Reporter(T,M)- enter other products encoded as reporters that do not appear in the drop down list
  • Gene(T,M)- enter WBGeneID for protein output of construct, which isn't considered a reporter product
  • Integrated by(S)- choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
  • Strain(T,M)- not used consistently, enter approved strain names for those strains that contain the transgene
  • Map(S,M)- choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."

Tab 2

  • Map Paper(T,M?)- WBPaperID for paper that reports mapping info
  • Map Person(T,M)- WBPersonID? or Name, person evidence
  • Marker for(T,M)- not used, Wen's expression data
  • Marker Paper(T,M)- WBPaperID, not used, Wen's expression data
  • Reference(T,M)- WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
  • Remark(T,M)- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
    • "Conflicting mapping info: ..."
    • "Conflicting genotype: ..."
    • "No transgene info in original publication."
    • "Other integration method: ..."
    • "Clone = "
    • "Mapping info: "
  • Species(T,M?)- not used?
  • Synonym(T,M)- other names for the transgene or construct
  • Driven by Construct(T,M)- not sure what this is
  • Location(T,M)- Lab designations for people who have the transgene, not sure about this.

Tab 3

  • Movie(T)- used?
  • Picture(T)- used?
  • Search New Transgene(T)- use to retrieve all transgenes that do not have any summary or remark data
  • SQL-use?

Changes implemented in Phenote form to make the transgene OA

  • Autocomplete
  • Multi-ontology
  • Big text = free text, editable expanded box
  • Selection list
  • Selection list multiple list field
  • multi-drop down
  • Free text, pipe separated values
Tab 1 (Transgene)
  • Pgdbid->keep - postgres database ID, autogenerated -> no change
  • Name->keep free text ->no change
  • Synonym->keep -free text, pipe separated values->no change (but moved from tab 2
  • Summary ->keep- Big text -> no change
  • Driven by Gene->keep -> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
  • Reporter Product->keep -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.
  • Other Reporter->keep - free text, pipe separated values -> no change
  • Gene->keep-> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
  • 3'UTR-> multi-ontology, added by Chris 5/24/12
  • Rescues ->NEW field -> multi-ontology WBvariation ->Changed model and deleted this tag.
  • Coinjection marker ->NEW field text, not dumped
  • Reporter type -> drop down list values of transcriptional or translational reporter
  • Remark->keep-> Big text, pipe separated values ->no change
Tab 2 (Isolation)
  • Clone/plasmid->NEW Field-> multi -ontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone where a->type = "Plasmid" (remove all "sjj_" clones). also see e-mail. (for now this will be a no dump field).
  • Integration method->keep -multi-drop down list -> no change. used to be called integrated by
  • Map->keep-> multi-drop down list -> no change
  • Map Paper->keep ->multi-ontology paper
  • Map Person->keep->multi-ontology person
  • Laboratory->keep-> multi-ontology laboratory, used be be called Location
  • Strain->keep ->free text, pipe separated values ->no change
Tab 3 (Expression)
  • Curator
  • Paper->keep-> change to multi-ontology paper, possible to make box expandable?
  • Person->multiontology
  • Marker for->keep->free text->no change
  • Marker Paper->keep->multi-ontology paper
  • Species->keep->no change
  • Driven by Construct->keep-> free text->no change
  • Movie-> removed
  • Picture-> removed
  • Search New Transgene-> keep-> no change
  • Fail for marking transgenes that are falsely attributed to a given paper.
Tab 4 (Postgres)
  • SQL-> remove

--kjy 00:10, 26 May 2012 (UTC)

Way in the future
  • Genomic Expression->New field - for developing a standardizing transgene expression nomenclature, eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)