Difference between revisions of "Transgene curation pipeline"

Revision as of 01:54, 24 July 2010

back to Transgenes back to Caltech documentation

Curating transgenes

Invoke the phenote transgene configuration interface and access postres

go to directory with phenote
 $./phenote -c worm-transgene.cfg

If you want to see all the current 'new' transgenes picked up by Textpresso, go to Tab 3 and press the "Search New Transgene" retrieve button. This action with retrieve all transgene objects that have data in the Summary or Remark fields. Usually there will be paper object info already since it was entered from the Textpresso search.

Curators should look for information of new transgenes in the paper document provided by Textpresso (main paper or supplementary file).

Sometimes papers do not provide any information on the transgenes, only the name is provided. Then "No transgene info in original publication." should be entered into the Remark field so that it will not be identified as a new transgene again.

Here is the controlled vocabulary for the transgene remark field:

Remark "Conflicting mapping info: ..."
Remark "Conflicting genotype: ..."
Remark "No transgene info in original publication."
Remark "Other integration method: ..."
Remark "Clone = "
Remark "Mapping info: "

Phenote transgene.cfg

T= free text; M= multiple values, separate values with a pipe(|); S= selection list

Tab 1

Pgdbid- postgres database ID, entered automatically when curator enters a new transgene

Name(T)-approved name following Lab-prefix (or WBPaperID), Is or Ex, number

Summary(T)- genotype, including co-injection marker and relevant information about making the construct, if papers rport conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.

Driven by Gene(T,M)- enter WBGeneID used for promoters in every promoter driven construct of the transgene

Reporter Product(S,M)- list has common reporter genes, GFP, RFP, LacZ, etc.

Other Reporter(T,M)- enter other products encoded as reporters that do not appear in the drop down list

Gene(T,M)- enter WBGeneID for protein output of construct, which isn't considered a reporter product

Integrated by(S)- choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."

Strain(T,M)- not used consistently, enter approved strain names for those strains that contain the transgene

Map(S,M)- choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."

Tab 2

Map Paper(T,M?)- WBPaperID for paper that reports mapping info
Map Person(T,M)- WBPersonID? or Name, person evidence
Marker for(T,M)- not used, Wen's expression data
Marker Paper(T,M)- WBPaperID, not used, Wen's expression data
Reference(T,M)- WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
Remark(T,M)- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
- "Conflicting mapping info: ..."
- "Conflicting genotype: ..."
- "No transgene info in original publication."
- "Other integration method: ..."
- "Clone = "
- "Mapping info: "
Species(T,M?)- not used?
Synonym(T,M)- other names for the transgene or construct
Driven by Construct(T,M)- not sure what this is
Location(T,M)- Lab designations for people who have the transgene, not sure about this.

Tab 3

Movie(T)- used?
Picture(T)- used?
Search New Transgene(T)- use to retrieve all transgenes that do not have any summary or remark data
SQL-use?

Transfer from phenote to OA

Autocomplete
Multi-ontology
Big text = free text, editable expanded box
Selection list
Selection list multiple list field
multi-drop down
Free text, pipe separated values

Tab 1 (Transgene)

Pgdbid->keep - postgres database ID, autogenerated -> no change

Name->keep free text ->no change

Synonym->keep -free text, pipe separated values->no change (but moved from tab 2

Summary ->keep- Big text -> no change

Driven by Gene->keep -> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique

Reporter Product->keep -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.

Other Reporter->keep - free text, pipe separated values -> no change

Gene->keep-> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique

Rescues ->NEW field -> multi-ontology WBGene

Tab 2 (Isolation)

Reference->keep-> change to multi-ontology paper, possible to make box expandable?

Remark->keep-> Big text, pipe separated values ->no change

Clone/plasmid->NEW Field-> multi -ontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone where a->type = "Plasmid" (remove all "sjj_" clones). also see e-mail. (for now this will be a no dump field).

Integrated by->keep -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.

Map->keep-> multi-drop down list -> no change

Map Paper->keep ->multi-ontology paper

Map Person->keep->multi-ontology person

Location->keep-> multi-ontology laboratory

Strain->keep ->free text, pipe separated values ->no change

Tab 3 (Expression)

Marker for->keep->free text->no change

Marker Paper->keep->multi-ontology paper

Species->keep->no change

Driven by Construct->keep-> free text->no change

Movie-> keep->no change

Picture-> keep->no change

Tab 4 (Postgres)

Search New Transgene-> keep-> no change

SQL-> keep->no change

Way in the future

Co-injection marker-> New field

Genomic Expression->New field - for developing a standardizing transgene expression nomenclature, eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)

@@ Line 73: / Line 73: @@
 ==Transfer from phenote to OA==
-'''T'''= free text;
+*Autocomplete
-'''M'''= multiple values, separate values with a pipe(|);
+*Multi-ontology
-'''S'''= selection list
+*Big text = free text, editable expanded box
-====Fields to keep====
+*Selection list
-These are fields that I use, when switching to OA, some can use some modifications.
+*Selection list multiple list field
+*multi-drop down
+*Free text, pipe separated values
-*'''Pgdbid'''->''keep'' - postgres database ID,  entered automatically when curator enters a new transgene
+=====Tab 1 (Transgene)=====
+*'''Pgdbid'''->''keep'' - postgres database ID,  autogenerated -> no change
-*'''Name(T)'''->''keep'' -approved name following Lab-prefix (or WBPaperID), Is or Ex, number.
+*'''Name'''->''keep'' free text ->no change
-*'''Summary(T)'''->''keep'' - genotype, including co-injection marker and relevant information about making the construct, if papers report conflicting genotypes use Remark field and controlled vocabulary "Conflicting genotype: ...", if no information enter "No transgene info in original publication." in Remark field.
+*'''Synonym'''->''keep'' -free text, pipe separated values->no change (but moved from tab 2
-*'''Driven by Gene(T,M)'''->''keep, make selection list, allow multiple entry by public_name convert to WBGeneID based on latest genename server version, make sure all entries are unique'' - enter WBGeneID used for promoters in every promoter driven construct of the transgene
+*'''Summary''' ->''keep''- Big text -> no change
-*'''Reporter Product(S,M)'''->''keep'' - list has common reporter genes (heterologous in C. elegans), GFP, RFP, LacZ, etc.
+*'''Driven by Gene'''->''keep'' -> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
-*'''Other Reporter(T,M)'''->''keep'' - enter other products encoded as reporters that do not appear in the drop down list
+*'''Reporter Product'''->''keep'' -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.
-*'''Gene(T,M)'''->''keep, make selection list, allow multiple entry by public_name convert to WBGeneID based on latest genename server version, make sure all entries are unique''  - enter WBGeneID for protein output of construct, which isn't considered a reporter product
+*'''Other Reporter'''->''keep'' - free text, pipe separated values -> no change
+*'''Gene'''->''keep-> change to multi-ontology WBGene, I will want to enter gene by public/sequence name, postgres stores WBGeneID, make sure entries are unique
-*'''Integrated by(S)'''->''keep'' - choose integration method if known, use 'not integrated' for Ex transgenes, if integration method is not listed, use Remark field and controlled vocabulary: "Other integration method: ..."
+*'''Rescues''' ->NEW field -> multi-ontology WBGene
-*'''Strain(T,M)'''->''keep'' - not used consistently, should be strain names for those strains that contain the transgene, but labs have complained about receiving too many requests for the strains(?)
+=====Tab 2 (Isolation)=====
+*'''Reference'''->''keep''-> change to multi-ontology paper, possible to make box expandable?
-*'''Map(S,M)'''->''keep, and keep together with other Map fields'' - choose LG(s) of integrated array if known, if papers report differing map positions use Remark field and controlled vocabulary "Conflicting mapping info: ..."
+*'''Remark'''->''keep''-> Big text, pipe separated values ->no change
-*'''Map Paper(T,M?)'''->''keep?, make selection list'' - WBPaperID for paper that reports mapping info or that performed the mapping?
+*'''Clone/plasmid'''->''NEW Field''-> multi -ontology using Clone list from acedb select a, a->general_remark, a->positive_gene from a in class clone  where a->type = "Plasmid" (remove all "sjj_" clones).  also see e-mail. (for now this will be a no dump field).
-*'''Map Person(T,M)'''->''keep''- WBPersonID? or Name, person evidence
+*'''Integrated by'''->''keep'' -multi-drop down list -> no change, but I do need to know where the file is so I can edit it.
-*'''Reference(T,M)'''->''keep, move to first tab, make selection field, allow multiples'' - WBPaperID, generally autofilled by Textpresso cron job script and bulk upload of Ex search script, add new paper if necessary
+*'''Map'''->''keep''-> multi-drop down list -> no change
-*'''Remark(T,M)'''->''keep, move to first tab''- catch all used for clarifying info from other fields, and for entering construct specifics, in some cases use controlled vocabulary
+*'''Map Paper'''->''keep'' ->multi-ontology paper
-** "Conflicting mapping info: ..."
-** "Conflicting genotype: ..."
-** "No transgene info in original publication."
-** "Other integration method: ..."
-** "Mapping info: "
-*'''Synonym(T,M)'''->''keep'' - other names for the transgene or construct
+*'''Map Person'''->''keep''->multi-ontology person
-*'''Search New Transgene(T)'''-> ''keep''- use to retrieve all transgenes that do not have any summary or remark data
+*'''Location'''->''keep''-> multi-ontology laboratory
-====Unknown/unused fields====
+*'''Strain'''->''keep'' ->free text, pipe separated values ->no change
-*'''Marker for(T,M)'''->''not sure if this is still needed, ask Wen" ->  Wen's expression data
-*'''Marker Paper(T,M)'''->''same as above''- WBPaperID, not used ->Wen's expression data
+=====Tab 3 (Expression)=====
+*'''Marker for'''->''keep''->free text->no change
-*'''Species(T,M?)'''->''used?''  if this is for species the construct is expressed in, can we make this default C. elegans unless otherwise stated, and can we make this a selection list?
+*'''Marker Paper'''->''keep''->multi-ontology paper
-*'''Driven by Construct(T,M)'''->''ask Wen'' looks like cell/tissue specific promoter expression information, is this part of Wen's expression curation?  Can we make this a cell/tissue ontology based field? is there a need for a  life-stage expression driver field as well??
+*'''Species'''->''keep''->no change
-*'''Location(T,M)'''->''keep?''- Lab designations for people who have the transgene--are we still using this?  Where is this information extracted from, last author of paper?  Can we automatically assign the lab based on the transgene prefix for many of the cases?
+*'''Driven by Construct'''->''keep''-> free text->no change
-*'''Movie(T)'''-> ''used?''
+*'''Movie'''-> ''keep''->no change
-*'''Picture(T)'''-> ''used?''
+*'''Picture'''-> ''keep''->no change
-*'''SQL'''-> ''used?''
+=====Tab 4 (Postgres)=====
+*'''Search New Transgene'''-> ''keep''-> no change
-====Proposed New Fields====
+*'''SQL'''-> ''keep''->no change
-*'''Associated Clone/Plasmid'''->''New field, selection multiple list field, with ability to modify list"
+=====Way in the future=====
 *'''Co-injection marker'''-> ''New field''
 *'''Genomic Expression'''->''New field'' - for developing a standardizing transgene expression nomenclature,  eventually we will want to fill this by script composition based on other fields (promoters, reporters, clones)

Difference between revisions of "Transgene curation pipeline"

Revision as of 01:54, 24 July 2010

Contents

Curating transgenes

Phenote transgene.cfg

Tab 1

Tab 2

Tab 3

Transfer from phenote to OA

Tab 1 (Transgene)

Tab 2 (Isolation)

Tab 3 (Expression)

Tab 4 (Postgres)

Way in the future

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools