Difference between revisions of "Transgene curation pipeline"

From WormBaseWiki
Jump to navigationJump to search
m
m
Line 74: Line 74:
 
'''S'''= selection list
 
'''S'''= selection list
  
*''keep'' Pgdbid- postgres database ID,  entered automatically when curator enters a new transgene
+
*'''Pgdbid'''->''keep'' - postgres database ID,  entered automatically when curator enters a new transgene
  
*''keep'' Name(T)-approved name following Lab-prefix (or WBPaperID), Is or Ex, number.  
+
*'''Name(T)'''->''keep'' -approved name following Lab-prefix (or WBPaperID), Is or Ex, number.  
  
*''New field'' -Genomic Expression- for developing a means for standardizing transgene expression nomenclature,  eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)
+
*'''Genomic Expression'''->''New field'' - for developing a means for standardizing transgene expression nomenclature,  eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)
  
*''keep'' Summary(T)- genotype, including co-injection marker and relevant information about making the construct, if papers rport conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
+
*'''Summary(T)'''->''keep'' - genotype, including co-injection marker and relevant information about making the construct, if papers report conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
  
*''keep, make selection list which allows multiple entry and public_name entry and WBGene output'' Driven by Gene(T,M)- enter WBGeneID used for promoters in every promoter driven construct of the transgene
+
*'''Driven by Gene(T,M)'''->''keep, make selection list which allows multiple entry and public_name entry and WBGene output'' - enter WBGeneID used for promoters in every promoter driven construct of the transgene
  
*''keep'' Reporter Product(S,M)- list has common reporter genes, GFP, RFP, LacZ, etc.
+
*'''Reporter Product(S,M)'''->''keep'' - list has common reporter genes, GFP, RFP, LacZ, etc.
  
*''keep'' Other Reporter(T,M)- enter other products encoded as reporters that do not appear in the drop down list
+
*'''Other Reporter(T,M)'''->''keep'' - enter other products encoded as reporters that do not appear in the drop down list
  
*''keep, make selection list which allows multiple entries and public_name entry and WBGene output''  Gene(T,M)- enter WBGeneID for protein output of construct, which isn't considered a reporter product
+
*'''Gene(T,M)'''->''keep make selection list which allows multiple entries and public_name entry and WBGene output''  - enter WBGeneID for protein output of construct, which isn't considered a reporter product
  
*''keep'' Integrated by(S)- choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
+
*'''Integrated by(S)'''->''keep'' - choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
  
*''New field, selection multiple list field, with ability to modify list" Clone/Plasmid
+
*'''Clone/Plasmid'''->''New field, selection multiple list field, with ability to modify list"  
  
*''keep'' Strain(T,M)- not used consistently, enter approved strain names for those strains that contain the transgene
+
*'''Strain(T,M)'''->''keep'' - not used consistently, should be strain names for those strains that contain the transgene, but labs have complained about too receiving too many requests for the strains(?)
  
*''keep'' Map(S,M)- choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."
+
*'''Map(S,M)'''->''keep, and keep together with other Map fields'' - choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."
  
*''talk to wen'' Map Paper(T,M?)- WBPaperID for paper that reports mapping info
+
*'''Map Paper(T,M?)'''->''keep'' - WBPaperID for paper that reports mapping info
  
*''talk to wen'' Map Person(T,M)- WBPersonID? or Name, person evidence
+
*'''Map Person(T,M)'''->''keep''- WBPersonID? or Name, person evidence
  
*''talk to wen'' Marker for(T,M)- not used, Wen's expression data
+
*'''Marker for(T,M)'''->''not sure if this is still needed, ask Wen" -  Wen's expression data
  
*''talk to wen'' Marker Paper(T,M)- WBPaperID, not used, Wen's expression data
+
*'''Marker Paper(T,M)'''->''same as above''- WBPaperID, not used, Wen's expression data
  
*''keep, make selection field, allow multiples'' Reference(T,M)- WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
+
*'''Reference(T,M)'''->''keep, move to first tab, make selection field, allow multiples'' - WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
  
* Remark(T,M)- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
+
*'''Remark(T,M)'''->''keep, move to first tab''- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
 
** "Conflicting mapping info: ..."
 
** "Conflicting mapping info: ..."
 
** "Conflicting genotype: ..."
 
** "Conflicting genotype: ..."
 
** "No transgene info in original publication."
 
** "No transgene info in original publication."
 
** "Other integration method: ..."
 
** "Other integration method: ..."
 
 
** "Mapping info: "
 
** "Mapping info: "
  
* Species(T,M?)- not used?
+
*'''Species(T,M?)'''->''used?''
  
* Synonym(T,M)- other names for the transgene or construct
+
*'''Synonym(T,M)'''->''keep'' - other names for the transgene or construct
  
* Driven by Construct(T,M)- not sure what this is
+
*'''Driven by Construct(T,M)'''->''keep?''- not sure what this is
  
* Location(T,M)- Lab designations for people who have the transgene, not sure about this.  
+
*'''Location(T,M)'''->''keep?''- Lab designations for people who have the transgene, not sure about this.  
  
* Movie(T)- used?
+
*'''Movie(T)'''-> ''used?''
  
* Picture(T)- used?
+
*'''Picture(T)'''-> ''used?''
  
* Search New Transgene(T)- use to retrieve all transgenes that do not have any summary or remark data
+
*'''Search New Transgene(T)'''-> ''keep''- use to retrieve all transgenes that do not have any summary or remark data
  
* SQL-use?
+
*'''SQL'''-> ''used?''

Revision as of 05:31, 28 June 2010

Curate with Phenote

Invoke the phenote transgene configuration interface and access postres

go to directory with phenote
 $./phenote -c worm-transgene.cfg

If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since it was entered from the Textpresso search.

Curators should look for information of new transgenes in the paper document provided by Textpresso (main paper or supplementary file).

Sometimes papers do not provide any information on the transgenes, only the name is provided. Then "No transgene info in original publication." should be entered into the Remark field so that it will not be identified as a new transgene again.

Here is the controlled vocabulary for the transgene remark field:

  • Remark "Conflicting mapping info: ..."
  • Remark "Conflicting genotype: ..."
  • Remark "No transgene info in original publication."
  • Remark "Other integration method: ..."
  • Remark "Clone = "
  • Remark "Mapping info: "

Phenote transgene.cfg

T= free text; M= multiple values, separate values with a pipe(|); S= selection list

Tab 1

  • Pgdbid- postgres database ID, entered automatically when curator enters a new transgene
  • Name(T)-approved name following Lab-prefix (or WBPaperID), Is or Ex, number
  • Summary(T)- genotype, including co-injection marker and relevant information about making the construct, if papers rport conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
  • Driven by Gene(T,M)- enter WBGeneID used for promoters in every promoter driven construct of the transgene
  • Reporter Product(S,M)- list has common reporter genes, GFP, RFP, LacZ, etc.
  • Other Reporter(T,M)- enter other products encoded as reporters that do not appear in the drop down list
  • Gene(T,M)- enter WBGeneID for protein output of construct, which isn't considered a reporter product
  • Integrated by(S)- choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
  • Strain(T,M)- not used consistently, enter approved strain names for those strains that contain the transgene
  • Map(S,M)- choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."

Tab 2

  • Map Paper(T,M?)- WBPaperID for paper that reports mapping info
  • Map Person(T,M)- WBPersonID? or Name, person evidence
  • Marker for(T,M)- not used, Wen's expression data
  • Marker Paper(T,M)- WBPaperID, not used, Wen's expression data
  • Reference(T,M)- WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
  • Remark(T,M)- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
    • "Conflicting mapping info: ..."
    • "Conflicting genotype: ..."
    • "No transgene info in original publication."
    • "Other integration method: ..."
    • "Clone = "
    • "Mapping info: "
  • Species(T,M?)- not used?
  • Synonym(T,M)- other names for the transgene or construct
  • Driven by Construct(T,M)- not sure what this is
  • Location(T,M)- Lab designations for people who have the transgene, not sure about this.

Tab 3

  • Movie(T)- used?
  • Picture(T)- used?
  • Search New Transgene(T)- use to retrieve all transgenes that do not have any summary or remark data
  • SQL-use?

Transfer from phenote to OA

T= free text; M= multiple values, separate values with a pipe(|); S= selection list

  • Pgdbid->keep - postgres database ID, entered automatically when curator enters a new transgene
  • Name(T)->keep -approved name following Lab-prefix (or WBPaperID), Is or Ex, number.
  • Genomic Expression->New field - for developing a means for standardizing transgene expression nomenclature, eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)
  • Summary(T)->keep - genotype, including co-injection marker and relevant information about making the construct, if papers report conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
  • Driven by Gene(T,M)->keep, make selection list which allows multiple entry and public_name entry and WBGene output - enter WBGeneID used for promoters in every promoter driven construct of the transgene
  • Reporter Product(S,M)->keep - list has common reporter genes, GFP, RFP, LacZ, etc.
  • Other Reporter(T,M)->keep - enter other products encoded as reporters that do not appear in the drop down list
  • Gene(T,M)->keep make selection list which allows multiple entries and public_name entry and WBGene output - enter WBGeneID for protein output of construct, which isn't considered a reporter product
  • Integrated by(S)->keep - choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
  • Clone/Plasmid->New field, selection multiple list field, with ability to modify list"
  • Strain(T,M)->keep - not used consistently, should be strain names for those strains that contain the transgene, but labs have complained about too receiving too many requests for the strains(?)
  • Map(S,M)->keep, and keep together with other Map fields - choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."
  • Map Paper(T,M?)->keep - WBPaperID for paper that reports mapping info
  • Map Person(T,M)->keep- WBPersonID? or Name, person evidence
  • Marker for(T,M)->not sure if this is still needed, ask Wen" - Wen's expression data
  • Marker Paper(T,M)->same as above- WBPaperID, not used, Wen's expression data
  • Reference(T,M)->keep, move to first tab, make selection field, allow multiples - WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
  • Remark(T,M)->keep, move to first tab- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
    • "Conflicting mapping info: ..."
    • "Conflicting genotype: ..."
    • "No transgene info in original publication."
    • "Other integration method: ..."
    • "Mapping info: "
  • Species(T,M?)->used?
  • Synonym(T,M)->keep - other names for the transgene or construct
  • Driven by Construct(T,M)->keep?- not sure what this is
  • Location(T,M)->keep?- Lab designations for people who have the transgene, not sure about this.
  • Movie(T)-> used?
  • Picture(T)-> used?
  • Search New Transgene(T)-> keep- use to retrieve all transgenes that do not have any summary or remark data
  • SQL-> used?