Difference between revisions of "RNAi"

From WormBaseWiki
Jump to navigationJump to search
m
 
(25 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
== RNAi Data Model ==
 
== RNAi Data Model ==
  
This is the ?RNAi data model as of WormBase Release WS227:
+
This is the ?RNAi data model as of WormBase Release WS251:
  
 
  //////////////////////////////////////////////////////////////////
 
  //////////////////////////////////////////////////////////////////
Line 25: Line 25:
 
                         Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment  
 
                         Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment  
 
                                                       // such as yk clone; not UNIQUE anymore
 
                                                       // such as yk clone; not UNIQUE anymore
 +
                        Clone ?Clone XREF Used_in_RNAi      // Chris WS244
 
                         PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in  
 
                         PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in  
 
                                                           // the experiment; not UNIQUE anymore
 
                                                           // the experiment; not UNIQUE anymore
Line 30: Line 31:
 
                         // which maps to a single place in the genome
 
                         // which maps to a single place in the genome
 
         Experiment      Laboratory ?Laboratory
 
         Experiment      Laboratory ?Laboratory
                        Author ?Author
 
 
                         Date UNIQUE DateType
 
                         Date UNIQUE DateType
 
                         Strain UNIQUE ?Strain
 
                         Strain UNIQUE ?Strain
Line 46: Line 46:
 
                         Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
 
                         Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
 
         Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
 
         Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
         DB_info        Database ?Database ?Database_field ?Accession_number //to link out to Phenobank ar2 02-DEC-05
+
         DB_info        Database ?Database ?Database_field ?Accession_number  
                                                                            //removed UNIQUE as reqs multiple connections
 
 
         Species        UNIQUE ?Species
 
         Species        UNIQUE ?Species
        Gene_regulation ?Gene_regulation XREF RNAi  // this tag is used when an RNAi experiment describes
 
                                                    // gene regulation ar2 29-MAR-06 for igor
 
 
         Interaction    ?Interaction
 
         Interaction    ?Interaction
 
         Reference      UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of  
 
         Reference      UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of  
 
                                                 // equates to a Study class for Will S
 
                                                 // equates to a Study class for Will S
 
         Phenotype      ?Phenotype XREF RNAi #Phenotype_info
 
         Phenotype      ?Phenotype XREF RNAi #Phenotype_info
         Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info //added by Wen to separate
+
         Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info  
                                                                          // Not phenotype from real phenotypes
 
 
         Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
 
         Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
 
         Remark          ?Text #Evidence
 
         Remark          ?Text #Evidence
Line 62: Line 58:
  
  
Here is the WS231 ?Phenotype_info model:
+
Here is the WS251 ?Phenotype_info model:
  
 
<pre>
 
<pre>
Line 71: Line 67:
 
////////////////////////////////////////////
 
////////////////////////////////////////////
  
#Phenotype_info Paper_evidence ?Paper
+
#Phenotype_info Paper_evidence ?Paper
Person_evidence ?Person
+
                Person_evidence ?Person
Curator_confirmed ?Person
+
                Curator_confirmed ?Person
    Remark ?Text   #Evidence // specific remarks about the phenotype
+
                Remark ?Text #Evidence // specific remarks about the phenotype
Quantity_description ?Text #Evidence //Remark to describe what quantity describes, below
+
                Quantity_description ?Text #Evidence //Remark to describe what quantity describes, below
Quantity       UNIQUE Int UNIQUE     Int     #Evidence
+
                Quantity UNIQUE Int UNIQUE Int #Evidence
Not     #Evidence //This is being phased out but is needed for the next phase [06/08/10].
+
                Not #Evidence //This is being phased out but is needed for the next phase [06/08/10].
Anatomy_term ?Anatomy_term #Evidence
+
                Penetrance Incomplete Text #Evidence
                Penetrance     Incomplete     Text   #Evidence
+
                          Low Text #Evidence
                                Low             Text   #Evidence
+
                          High Text #Evidence
                                High           Text   #Evidence
+
                          Complete Text #Evidence
                                Complete       Text #Evidence
+
                          Range UNIQUE Int UNIQUE Int #Evidence // Range of penetrance
Range UNIQUE Int UNIQUE Int #Evidence // Range of penetrance
+
                Recessive #Evidence
                Recessive               #Evidence
+
                Semi_dominant #Evidence
                Semi_dominant           #Evidence
+
                Dominant #Evidence
                Dominant               #Evidence
+
                Haplo_insufficient #Evidence
                Haplo_insufficient     #Evidence
+
                Caused_by_gene ?Gene #Evidence
Caused_by ?Gene #Evidence
+
                Caused_by_other ?Text #Evidence
Caused_by_other ?Text #Evidence
+
                Rescued_by_transgene ?Transgene
Rescued_by_transgene ?Transgene
+
                Variation_effect Gain_of_function_undetermined_type #Evidence
                Loss_of_function  UNIQUE  Hypomorph                              #Evidence
+
                                Antimorph_gain_of_function #Evidence
                                Amorph                                 #Evidence
+
                                Dominant_negative_gain_of_function #Evidence
                                                Uncharacterised_loss_of_function        #Evidence
+
                                Hypermorph_gain_of_function #Evidence
                Gain_of_function  UNIQUE  Dominant_negative      #Evidence
+
                                 Neomorph_gain_of_function #Evidence
                                                Hypermorph              #Evidence
+
                                Loss_of_function_undetermined_extent #Evidence  
                                                Neomorph                #Evidence
+
                                Null #Evidence
                                                Uncharacterised_gain_of_function        #Evidence
+
                                Predicted_null_via_sequence #Evidence
                Other_allele_type UNIQUE  Wild_type #Evidence
+
                                Probable_null_via_phenotype #Evidence
                                                Isoallele    #Evidence
+
                                Hypomorph_reduction_of_function #Evidence
  Mixed      #Evidence      
+
                                Predicted_hypomorph_via_sequence #Evidence
Affected_by Molecule   ?Molecule   #Evidence // ?Molecule model Karen Yook
+
                                Probable_hypomorph_via_phenotype #Evidence
                Temperature_sensitive   Heat_sensitive Text           #Evidence
+
                                Wild_allele #Evidence
                                        Cold_sensitive Text           #Evidence
+
                Affected_by Molecule ?Molecule #Evidence // ?Molecule model Karen Yook
                Maternal       UNIQUE Strictly_maternal       #Evidence
+
                EQ_annotations Anatomy_term ?Anatomy_term ?PATO_term #Evidence
                                        With_maternal_effect   #Evidence
+
                              Life_stage ?Life_stage ?PATO_term #Evidence
                Paternal       #Evidence
+
                              GO_term ?GO_term ?PATO_term #Evidence
Phenotype_assay Life_stage  ?Life_stage #Evidence
+
                              Molecule_affected  ?Molecule ?PATO_term #Evidence
Strain     ?Strain #Evidence
+
                Temperature_sensitive Heat_sensitive Text #Evidence
Treatment   ?Text #Evidence
+
                                      Cold_sensitive Text #Evidence
Temperature ?Text #Evidence
+
                Maternal UNIQUE Strictly_maternal #Evidence
Genotype   ?Text #Evidence
+
                                With_maternal_effect #Evidence
Ease_of_scoring UNIQUE ES0_Impossible_to_score #Evidence
+
                Paternal #Evidence
      ES1_Very_hard_to_score #Evidence
+
                Phenotype_assay Strain ?Strain #Evidence
      ES2_Difficult_to_score #Evidence
+
                                Treatment ?Text #Evidence
      ES3_Easy_to_score #Evidence
+
                                Temperature ?Text #Evidence
 +
                                Genotype ?Text #Evidence
 +
                Ease_of_scoring UNIQUE ES0_Impossible_to_score #Evidence
 +
                                      ES1_Very_hard_to_score #Evidence
 +
                                      ES2_Difficult_to_score #Evidence
 +
                                      ES3_Easy_to_score #Evidence
  
 
</pre>
 
</pre>
  
Here is the WS231 #Evidence hash model:
+
Here is the WS251 #Evidence hash model:
  
 
<pre>
 
<pre>
Line 126: Line 127:
 
////////////////////////////////////////////////////////////////////////////////
 
////////////////////////////////////////////////////////////////////////////////
  
#Evidence Paper_evidence ?Paper                                       // Data from a Paper
+
#Evidence Paper_evidence ?Paper                           // Data from a Paper
           Published_as ?Text                                           //  .. track other names for the same data
+
           Published_as ?Text                               //  .. track other names for the same data
           Person_evidence ?Person                                     // Data from a Person
+
           Person_evidence ?Person                         // Data from a Person
           Author_evidence ?Author UNIQUE Text                         // Data from an Author
+
           Author_evidence ?Author UNIQUE Text             // Data from an Author
           Accession_evidence ?Database ?Accession_number             // Data from a database (NDB/UNIPROT etc)
+
           Accession_evidence ?Database ?Accession_number   // Data from a database (NDB/UNIPROT etc)
           Protein_id_evidence ?Text                                   // Reference a protein_ID
+
           Protein_id_evidence ?Text                       // Reference a protein_ID
           GO_term_evidence ?GO_term                                   // Reference a GO_term
+
           GO_term_evidence ?GO_term                       // Reference a GO_term
           Expr_pattern_evidence ?Expr_pattern                         // Reference a Expression pattern   
+
           Expr_pattern_evidence ?Expr_pattern             // Reference a Expression pattern   
           Microarray_results_evidence ?Microarray_results             // Reference a Microarray result
+
           Microarray_results_evidence ?Microarray_results // Reference a Microarray result
           RNAi_evidence ?RNAi                                         // Reference a RNAi knockdown
+
           RNAi_evidence ?RNAi                             // Reference a RNAi knockdown
          Gene_regulation_evidence ?Gene_regulation                  // Reference a Gene_regulation interaction
+
           CGC_data_submission                             // bless the data as comning from CGC
           CGC_data_submission                                         // bless the data as comning from CGC
+
  Curator_confirmed ?Person                       // bless the data manually  
  Curator_confirmed ?Person                                   // bless the data manually  
+
  Inferred_automatically Text                     // bless the data via a script
  Inferred_automatically Text                                 // bless the data via a script
+
  Date_last_updated UNIQUE DateType               // Stores last update timestamp
  Date_last_updated UNIQUE DateType                           // Stores last update timestamp
+
  Feature_evidence ?Feature   // Reference a Feature - eg for creation of isoform based on TEC-RED SL2
  Feature_evidence ?Feature       // Reference a Feature - eg for creation of isoform based on TEC-RED SL2
+
  Laboratory_evidence ?Laboratory                 // Reference a Lab
  Laboratory_evidence ?Laboratory                             // Reference a Lab
+
  From_analysis ?Analysis   // Reference an analysis
  From_analysis ?Analysis       // Reference an analysis
+
  Variation_evidence ?Variation   // Explicitly record variation from which IMP manual GO annotations are made
  Variation_evidence ?Variation       // Explicitly record variation from which IMP manual GO annotations are made
 
 
  Mass_spec_evidence ?Mass_spec_peptide
 
  Mass_spec_evidence ?Mass_spec_peptide
  Sequence_evidence ?Sequence       // for sequence data that hasn't been submitted to a public resource
+
  Sequence_evidence ?Sequence           // for sequence data that hasn't been submitted to a public resource
 +
  Remark ?Text
 +
</pre>
 +
 
 +
 
 +
== RNAi annotations in CitaceMinus ==
 +
 
 +
These papers have >2000 RNAi objects and, therefore, did not get parsed into postgres tables for RNAi (a total of 59,656 RNAi objects stored in CitaceMinus) :
 +
* WBPaper00004402 (2287 RNAi objects)
 +
* WBPaper00004403 (2584 RNAi objects)
 +
* WBPaper00004651 (2479 RNAi objects)
 +
* WBPaper00005654 (14253 RNAi objects)
 +
* WBPaper00006395 (3230 RNAi objects)
 +
* WBPaper00024497 (10951 RNAi objects)
 +
* WBPaper00025054 (20709 RNAi objects)
 +
* WBPaper00029258 (3163 RNAi objects)
  
</pre>
 
  
 
== RNAi Curation Standard Operating Procedure (SOP) ==
 
== RNAi Curation Standard Operating Procedure (SOP) ==
Line 226: Line 240:
 
|rna_date            ||rna_lifestagequality            ||rna_quantfromto_hst
 
|rna_date            ||rna_lifestagequality            ||rna_quantfromto_hst
 
|-
 
|-
|rna_date_hst            ||rna_molaffectedquality             ||rna_remark
+
|rna_date_hst            ||rna_molaffected             ||rna_remark
 
|-
 
|-
|rna_deliverymethod            ||rna_molecule             ||rna_remark_hst
+
|rna_deliverymethod            ||rna_molaffectedquality             ||rna_remark_hst
 
|-
 
|-
|rna_deliverymethod_hst            ||rna_molecule_hst             ||rna_sequence
+
|rna_deliverymethod_hst            ||rna_molecule             ||rna_sequence
 
|-
 
|-
|rna_dnatext            ||rna_moleculeaffected             ||rna_sequence_hst
+
|rna_dnatext            ||rna_molecule_hst             ||rna_sequence_hst
 
|-
 
|-
 
|rna_dnatext_hst            ||rna_movie            ||rna_species
 
|rna_dnatext_hst            ||rna_movie            ||rna_species
Line 265: Line 279:
 
|}
 
|}
  
 +
== RNAi OA ==
  
== RNAi OA ==
+
CG updated 9-1-2015
  
 
NOTE: Warning: RNAi objects may have multiple lines in the OA due to the nested phenotype data (Penetrance info, etc.)
 
NOTE: Warning: RNAi objects may have multiple lines in the OA due to the nested phenotype data (Penetrance info, etc.)
 +
 +
'''Dependency Notice''': The Community Phenotype Form and the RNAi OA each have separate code to look up the most recent RNAi ID and generate new RNAi IDs based on that. If the code is changed for one, we should also check that the code is changed for the other.
 +
 +
 +
=== TAB 1 ===
  
 
[[File:RNAi_OA_TAB_1_9-1-2015.png|600px]]
 
[[File:RNAi_OA_TAB_1_9-1-2015.png|600px]]
  
=== TAB 1 ===
 
 
*pgid - the postgres ID - NOT DUMPED
 
*pgid - the postgres ID - NOT DUMPED
 
*Name - rna_name - RNAi : - WBRNAiID #Note: This is automatically assigned  
 
*Name - rna_name - RNAi : - WBRNAiID #Note: This is automatically assigned  
Line 282: Line 301:
 
*Genotype - rna_genotype - Genotype - Big text
 
*Genotype - rna_genotype - Genotype - Big text
 
*Treatment - rna_treatment - Treatment - Big text
 
*Treatment - rna_treatment - Treatment - Big text
*Life Stage - rna_lifestage - Life_stage - ?Life_stage (Ontology)
 
 
*Temperature - rna_temperature - Temperature - Text (Integer)
 
*Temperature - rna_temperature - Temperature - Text (Integer)
 
*Delivery Method - rna_deliverymethod - Delivered_by - (Multi-dropdown) four choices: "Bacterial_feeding", "Injection", "Soaking", "Transgene_expression"
 
*Delivery Method - rna_deliverymethod - Delivered_by - (Multi-dropdown) four choices: "Bacterial_feeding", "Injection", "Soaking", "Transgene_expression"
 
*Species - rna_species - Species - ?Species (Dropdown)
 
*Species - rna_species - Species - ?Species (Dropdown)
 
*Remark - rna_remark - Remark - Big text
 
*Remark - rna_remark - Remark - Big text
 +
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  
  
 +
=== TAB 2 ===
  
 
[[File:RNAi_OA_TAB_2_9-1-2015.png|600px]]
 
[[File:RNAi_OA_TAB_2_9-1-2015.png|600px]]
  
=== TAB 2 ===
 
 
*From Genereg - rna_fromgenereg - NOT DUMPED - Toggle #Note: We would like a toggle here for Xiaodong to create new RNAi objects while performing Gene_regulation curation and flag them for an RNAi curator to complete (just like the "From RNAi" toggle in the Gene_regulation OA) ''Named field "From Gene Reg" to make it smaller, I'd prefer "From Genereg" or "From GeneReg", if you'd really rather have "From Gene Regulation" let me know and I'll change it.  This is important before we start curation because the name of the field is the value that gets stored then the field is toggled on, so changing the field name later would mean changing the data in postgres as well -- J'' Sure, this sounds fine. We can keep it as small as you want; so, "From Genereg" is fine with me -- C
 
*NO DUMP - rna_nodump - NOT DUMPED - toggle - Prevents data from that row from being dumped. This will also indicate to the Curation Status Form (CSF) that the paper has not been curated and will come up as "oa_blank" in the CSF
 
 
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
*Penetrance From - To - rna_penfromto - (in-line with Phenotype) Range - Text (Integer-space-Integer)
+
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*Phenotype Suggestion - rna_suggested - not dumped - field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
 +
*Suggested Definition - rna_suggested_definition - NOT DUMPED - Big text field with definition of new suggested phenotype term
 +
*Child Of - rna_child_of - NOT DUMPED - list of parent phenotype term(s) for suggested phenotype
 +
*Affected By Molecule - rna_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology)
 +
*Penetrance From To - rna_penfromto - (in-line with Phenotype) Range - Text (Integer-space-Integer)
 
*Penetrance - rna_penetrance - (in-line with Phenotype) Penetrance - dropdown like in phenotype OA
 
*Penetrance - rna_penetrance - (in-line with Phenotype) Penetrance - dropdown like in phenotype OA
 +
*Quantity From To - rna_quantfromto - (in-line with Phenotype) Quantity - Text (Integer-space-Integer)
 +
*Quantity Description - rna_quantdesc - (in-line with Phenotype) Quantity_description - Big text
 
*Heat Sensitive - rna_heatsens - (in-line with Phenotype) Temperature_sensitive - Toggle
 
*Heat Sensitive - rna_heatsens - (in-line with Phenotype) Temperature_sensitive - Toggle
 
*Cold Sensitive - rna_coldsens - (in-line with Phenotype) Temperature_sensitive - Toggle
 
*Cold Sensitive - rna_coldsens - (in-line with Phenotype) Temperature_sensitive - Toggle
*Quantity From - To - rna_quantfromto - (in-line with Phenotype) Quantity - Text (Integer-space-Integer)
 
*Quantity Description - rna_quantdesc - (in-line with Phenotype) Quantity_description - Big text
 
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - Big Text -- Populate from Phenotype lines that have a Remark tag in #Phenotype_info following in .ace file.
 
*Molecule - rna_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology) ''This should be moved up inside the Phenotype subtags ? -- J'' Yes, that should do it -- C
 
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 
  
  
 +
=== TAB 3 ===
  
 
[[File:RNAi_OA_TAB_3_9-1-2015.png|600px]]
 
[[File:RNAi_OA_TAB_3_9-1-2015.png|600px]]
  
=== TAB 3 ===
+
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*Anatomy - rna_anatomy (Multi-ontology)
 +
*Anatomy Quality - rna_anatomyquality (Multi-ontology)
 +
*Life Stage - rna_lifestage - Life_stage - ?Life_stage (Multi-ontology)
 +
*Life Stage Quality - rna_lifestagequality (Multi-ontology)
 +
*Molecule Affected - rna_molaffected (Multi-ontology)
 +
*Mol Aff Quality - rna_molaffectedquality (Multi-ontology)
 +
*GO Process - rna_goprocess (Multi-ontology)
 +
*GO P Quality - rna_goprocessquality (Multi-ontology)
 +
*GO Function - rna_gofunction (Multi-ontology)
 +
*GO F Quality - rna_gofunctionquality (Multi-ontology)
 +
*GO Component - rna_gocomponent (Multi-ontology)
 +
*GO C Quality - rna_gocomponentquality (Multi-ontology)
 +
 
 +
=== TAB 4 ===
 +
 
 +
[[File:RNAi_OA_TAB_4_9-1-2015.png|600px]]
  
 +
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*NO DUMP - rna_nodump - NOT DUMPED - toggle - Prevents data from that row from being dumped. This will also indicate to the Curation Status Form (CSF) that the paper has not been curated and will come up as "oa_blank" in the CSF
 +
*From Genereg - rna_fromgenereg - NOT DUMPED - Toggle
 +
*Flag Gene Reg - rna_flaggenereg - NOT DUMPED - Toggle
 +
*Flag Genetic Intxn - rna_flaggeneticintxn - NOT DUMPED - Toggle
 
*Person Evidence - rna_person - Evidence Person_evidence - multiontology - for ?RNAi object's #Evidence
 
*Person Evidence - rna_person - Evidence Person_evidence - multiontology - for ?RNAi object's #Evidence
 
*History Name - rna_historyname - History_name - Text      #Note: this field is to accommodate older RNAi objects
 
*History Name - rna_historyname - History_name - Text      #Note: this field is to accommodate older RNAi objects
Line 319: Line 366:
 
**Phenobank2 Gene&RNAID GeneID=507328 | Phenobank3 Gene&RNAID GeneID=123456
 
**Phenobank2 Gene&RNAID GeneID=507328 | Phenobank3 Gene&RNAID GeneID=123456
 
*Expression Profile - rna_exprprofile - Expr_profile - Text  #Note: this field is to accommodate older RNAi objects
 
*Expression Profile - rna_exprprofile - Expr_profile - Text  #Note: this field is to accommodate older RNAi objects
*Phenotype Suggestion - rna_suggested - not dumped - field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
 
*Suggested Definition - rna_suggested_definition - not dumped - Big text field with definition of new suggested phenotype term
 
*Child Of - rna_child_of - not dumped - list of parent phenotype term(s) for suggested phenotype
 
 
 
 
[[File:RNAi_OA_TAB_4_9-1-2015.png|600px]]
 
 
=== TAB 4 ===
 
 
*
 
*
 
  
  
  
 
'''NOTES ON SUGGESTING NEW PHENOTYPE TERMS THROUGH THE RNAI OA'''
 
'''NOTES ON SUGGESTING NEW PHENOTYPE TERMS THROUGH THE RNAI OA'''
* When suggesting a new phenotype term in TAB 3 of the RNAi OA, curators should make sure to create a PGID/Row with a '''single''' placeholder phenotype that is intended to be replaced by the new phenotype term once the new term has been approved
+
* When suggesting a new phenotype term in TAB 2 of the RNAi OA, curators should make sure to create a PGID/Row with a '''single''' placeholder phenotype that is intended to be replaced by the new phenotype term once the new term has been approved
 
* If the data is dumped from the OA for upload before the new term has been approved, the object will dump with the placeholder phenotype term for that upload
 
* If the data is dumped from the OA for upload before the new term has been approved, the object will dump with the placeholder phenotype term for that upload
 
* This behavior is similar to that of the Phenotype OA
 
* This behavior is similar to that of the Phenotype OA
Line 456: Line 491:
 
12345  flagonly  no curator  has no curator
 
12345  flagonly  no curator  has no curator
 
</pre>
 
</pre>
 
== Parsing WS RNAi into OA ==
 
 
These papers have >2000 RNAi objects and, therefore, are not getting parsed into postgres tables for RNAi (a total of 59,656 RNAi objects stored in CitaceMinus) : WBPaper00004402 (2287 RNAi objects), WBPaper00004403 (2584 RNAi objects), WBPaper00004651 (2479 RNAi objects), WBPaper00005654 (14253 RNAi objects), WBPaper00006395 (3230 RNAi objects), WBPaper00024497 (10951 RNAi objects), WBPaper00025054 (20709 RNAi objects), WBPaper00029258 (3163 RNAi objects)
 
 
/home/postgres/work/pgpopulation/rna_rnai/makeRnaiCuratorMappings.pl  generates a mapping of RNAi objects to WBPerson curators based on WS231RNAiwithTimeStamp.ace
 
 
''Parsing script is /home/postgres/work/pgpopulation/rna_rnai/populate_rnai.pl : populates postgres and generates /home/postgres/work/pgpopulation/rna_rnai/deletion.ace to remove all lines read into postgres + method lines which are not read into postgres but will get generated by dumper -- J'' OK, I just read Wen's RNAi objects from WS231 into an empty ACEDB (they read in fine), then I deleted their contents with the deletion.ace file (which read in fine), and then repopulated those RNAi objects with data dumped from the Sandbox RNAi OA (which read in fine). So, everything seems to be in working order. -- C 4/2/2012
 
 
Ignore all #Evidence in #Phenotype_info and in Remark tag of ?RNAi object.
 
 
To generate .ace deletion file always delete the Method line.  Also delete anything getting parsed in.
 
 
''These tags are not getting parsed into the OA : Author Gene Gene_regulation Homol_homol Interaction Predicted_gene Uniquely_mapped'' Correct -- C
 
 
''These Phenotype subtags are not getting parsed into the OA : Curator_confirmed Molecule Paper_evidence .  To see the Molecules not populated look on the sandbox at /home/postgres/work/pgpopulation/rna_rnai/phen_subtags_not_populated'' We do want these molecules in the OA. I guess I forgot/didn't realize that the molecules were coming from the "Affected_by" tag inside the #Phenotype_info hash. Will it be a problem to populate these molecules into the Molecule field in the OA? -- C ''It will probably be fine, since Molecule wasn't in the sub-section for Phenotype I thought it might have been a different type of molecule (like Laboratory is different from Laboratory_evidence, or something like that)  You should move the field up into the Phenotype subtags in the wiki if I'm understanding that right.  Just move it up to where you want it and I'll move it in the OA.  I don't see the #Phenotype_info model in this wiki, and in the models.wrm I'm looking at there's no Affected_by tag, so I don't know how to parse it.  The other #Phenotype_info was relatively straightforward because it was UNIQUE, but Molecule is multiontology so I'd need to make the parser more complicated to deal with that.  Thing is that I don't see any data with multiple molecules.  Do you know if there's any ?  Does the field need to be a multiontology or is the data single value (so we can parse it that way) and we're making it multiontology in case it changes later ? -- J''  I've moved "Molecule" up into the Phenotype sub-tags and I've added the current (WS231) Phenotype_info model above, which includes the "Affected_by" and "Molecule" tags. I don't know if there are any annotations with multiple molecules, but even if there aren't any now, there's no reason there couldn't be one soon; therefore, I'd like to keep this multiontology, unless that is too much of a headache and we need to find another way around it. -- C ''There are many entries with multiple molecules, but they're parsed in.  I missed this last night, but there's one molecule that isn't valid WBMol:00005097 /home/postgres/work/pgpopulation/rna_rnai/bad_molecule -- J''  OK, that bad molecule should be under the identifier D005467 instead of WBMol:00005097. Can we just replace each instance of WBMol:00005097 with D005467? -- C '''done -- Turns out this is the only object that has two pgids in the molecule OA right now, so I'm associating it to the lower of the two pgids : 3778  Molecule fields store in postgres the pgid of the corresponding molecule OA entry instead of the molecule name, because when the molecule is created sometimes it won't have a name. -- J'''
 
 
''Is there only one Not phenotype in all the data ? (WBRNAi00085012)  If so, it'd be a lot easier to ignore this, and after it's live add the phenotype manually through the OA.  If not, let me know some example of other objects, because I can't find any to test for a parsing pattern.  Is this the same as Phenotype_not_observed ?'' Hmmm, this is strange. We've tried to get rid of any "NOT" tags that remained in the RNAi objects. Wen had done this programmatically, but perhaps she missed this one. This is identical to Phenotype_not_observed -- C ''I'll wait until we resolve the NOT below, but probably just add it manually after everything else is transferred -- J resolved -- J''
 
 
''We're not keeping remark nor any other #Phenotype_info for the field Phenotype_not_observed right ?  There's no place to put it'' Hmm, right; we would actually like to have remarks for Phenotype_not_observed. Can we add a "Phenotype_not_observed Remark" field? Given our earlier discussion, curators wouldn't put a "Phenotype" and a "Phenotype_not_observed" into the same row. -- C ''Ah, I didn't think it would work that way.  I thought that positive phenotypes and negative phentoypes were completely different things so that you could add them on the same line the way you'd add a Curator and a Paper as unrelated data.  If they're actually the same thing and should never be on the same line I'll parse it that way (it's easier), but then it would be easier to have a NOT toggle the way the phenotype OA works.  If we do it this way the NOTs would be on separate lines depending on all the #Phenotype_info and we can keep the same Remark OA field for the dumper.  It would just act like everything else but when the NOT toggle is on, it would use the Phenotype_not_observed in the .ace dump.  We should talk before I parse this since it makes a difference -- J'' OK, that makes sense; let's handle Phenotype_not_observed like any other Phenotype and just add the NOT toggle. Then we can dump those out in the *.ACE file as "Phenotype_not_observed" plus any additional information captured in the #Phenotype_info hash. -- C '''done -- J'''
 
 
''DNA_text is a double text field.  I'm assuming that the first part is always going to be a string of atcg so will never have a space.  In that case you could store the data in this field like qq(<atcg><space><next_thing>) but if you think you'd ever want spaces the better divider would be qq(<atcg>" "<next_thing")  I didn't see anything else in the model that stores two values in a row besides the From-To fields, if there are any, let me know and we'll talk about them.  I need to figure this out to know how to parse stuff.  We talked about Database which I'm treating as a triple text field (instead of ontology values) and I think you don't have spaces in those values either, but if you do we should talk about it.'' I'm unclear about this; I thought that the "DNA_text" field would be a bigtext field that could accommodate multiple sequences <atcg> separated by pipes. ''The DNA_text field has two parts, the DNA and the probe. So you would separate different DNA_text lines with pipes, but need a different separator to separate DNA from probe -- J'' OK, I'm not sure what you mean by probe; unfortunately this term ("probe") gets thrown around a lot with different meanings. Can you give me an example? -- C '''Just got the word from the model above, but from what we talked about, let me know if you decide we don't what the probe part parsed in at all -- J'''  As for the Database field, I thought we would enter it like this: Phenobank2 Gene&RNAID GeneID=507328, as discussed above. Is that not right? -- C ''Yes, the question is whether all those 5 things DNA, probe, database, database_field, and accession_number all have no spaces in their values.  If they have spaces they need doublequotes around them, but if they'll never have spaces we can just have spaces in the OA when entering the text data -- J'' I see, I misunderstood. No, these values should never have spaces within themselves. -- C '''Cool, the current data doesn't have any spaces, so it's good. -- J'''
 
 
''The OAs that have lifestage store the WBls:####### ID instead of the name, but the .ace file has names, so either there was some kind of translation, or more likely the data in there is wrong (someone should have switched it when the ?Life_stage class changed from text to IDs).  I'm converting from names to IDs to store in postgres, but some names don't match the list of valid values in postgres in obo_name_lifestage.  The list of RNAi objects and their Life_stage text is at /home/postgres/work/pgpopulation/rna_rnai/bad_lifestage'' OK, is it possible to correct these if I give you the proper names? "oocyte" should be "adult hermaphrodite" WBls:0000057, "Dauer" should be "dauer larva" WBls:0000032, "L3 larvae" should be "L3 larva" WBls:0000035, "young adult" should be "newly molted young adult hermaphrodite" WBls:0000063. I will have to look into "Mixed stages" and "L4-young adult". -- C ''I need the WBls IDs, I looked them up, added them earlier in the paragraph, and used them in the script. if you can find them for those last two that'd be good -- J'' OK for "Mixed stages" let's use "all stages" WBls:0000002 and for "L4-young adult" can we do two values? If we can I would use "L4 larva" WBls:0000038 and "newly molted young adult hermaphrodite" WBls:0000063. If only one value is allowed, just use "L4 larva" WBls:0000038. Thanks. -- C ''I think we could, but the Life_stage field is not multiontology do you want to switch it to multiontology or just go with WBls:0000038 (data currently parsed in with 38)'' OK, let's just leave it a single Ontology field (the model designates this as UNIQUE now anyway)-- C ''Also, why are we dumping Life_stage objects as names instead of WBls objects ? (probably talk to Wen about that) -- J'' Yeah, I have no idea about that. We'll have to ask Wen. -- C '''k, I'll leave as is for now -- J'''
 
 
''Parsing the sequence data all 777 entries don't have a valid sequence in gin_sequence at /home/postgres/work/pgpopulation/rna_rnai/bad_sequence this might be because they all don't have WBGenes or something else.  Is there a set that we should use like pcr_product ? Do you know why none of them match, is it the wrong tag or datatype in there ? The gin_sequence data is populated from genes that have a Corresponding_CDS or Corresponding_Transcript tag -- J'' Do the sequence objects need to be linked to a gene at the outset? That should be taken care of at the mapping stage. I've put the complete list (WS230) of sequence objects on tazendra at /home/acedb/chris/WS230_Sequence_objects.txt. -- C ''cool, I compared all the existing sequence entries, and they're all there, but that's not surprising since the RNAi sequences would create sequence objects in acedb by XREF, so it doesn't necessarily prove much.  Because there's so many possible sequence objects (2 million) it'd probably be best to leave it as pipe-separated text the way that it's parsed in now, but we should talk about it -- J'' Yes, as we discussed, these will remain pipe-separated text objects -- C
 
 
''There are a lot of pcr_products at /home/postgres/work/pgpopulation/rna_rnai/bad_pcrproduct but we might be able to get a lot more matches if we make it case insensitive.  Does case matter ?'' No, case doesn't matter and I think you're right that that is probably the problem. -- C ''allowing upper and lower case in letters after the period, and they all match now -- J'' Great! -- C
 
 
''There are a couple of bad papers at /home/postgres/work/pgpopulation/rna_rnai/bad_papers  One paper is invalid linked to a lot of RNAi, the other has a typo lowercase p'' OK, it looks like WBPaper00013501 changed its name to WBPaper00024307, and, as you pointed out, the other is just a case typo. Can we just make the appropriate changes to these, or do we need to be more formal about it? -- C ''Hardcoded both exceptions, they're okay now -- J'' Excellent! Thanks! -- C
 
 
''For all these problem objects if you want to fix them, we can edit the source .ace file, but if we ever switch the dataset to populate, we'd have to make all the changes in the new .ace file again.  Probably easiest would be to tell me which objects to ignore and I can hardcode in not giving an error for those, not populating postgres with them, and keep track to populate through the OA when live.  (if you want to keep track and change in the .ace file, that's easier for me, but in the past we've had to make dumps of .ace several times and used various sets so if it's the same with this datatype, it will be more of a pain for you'' I'd prefer, if possible, to make the changes directly to the .ace file(s) and thereby fix them permanently, unless there's a good reason not to do this. -- C ''You can sure.  You shouldn't need to since I hardcoded those exceptions in the parser, so they'll go into postgres correctly -- J'' Great! Thanks! -- C
 
 
''The .ace file has the species as "Caenorhabditis elegans", the value you wanted me to add to the OA (fits the other values's format) is "Caenorhabditis_elegans" so I'm converting from space in .ace to underscore for postgres.'' OK, I think leaving them both with a space is fine. I don't think there's any need for an underscore anywhere. -- C ''Well, all the other species have underscores, which is why I parsed it from space to underscore -- J'' OK, whatever is easiest. -- C
 
 
== To sync to tazendra ==
 
 
* Create OA tables /home/postgres/work/pgpopulation/rna_rnai/create_rnai_tables.pl
 
 
* Create obo tables for pcr product data /home/postgres/work/pgpopulation/obo_oa_ontologies/20120322_pcrproduct/create_obo_tables.pl
 
* Populate obo tables for pcr product based on WS229_PCR_products.txt with /home/postgres/work/pgpopulation/obo_oa_ontologies/20120322_pcrproduct/populate_obo_pcrproduct.pl (takes 30 minutes, lots of lines)
 

Latest revision as of 20:29, 30 July 2019

Archived RNAi documentation may be found here

RNAi Curation Mission Summary

The term "RNAi" stands for "RNA-interference" and refers to the targeted silencing of gene expression of a "target gene" via introduction of double stranded RNA (dsRNA) containing high degrees of sequence identity to the "target gene". In C. elegans, molecules of dsRNA may be introduced into the worm by a variety of different methods including direct micro-injection, soaking of worms in solution containing dsRNA, feeding worms bacteria that express dsRNA from a plasmid, and transgenic expression (within the worm) of dsRNA. For efficient and specific knockdown of gene expression, the dsRNA must have a minimum sequence identity with the target gene sequence. Any phenotype(s) resulting from the RNAi-mediated knockdown of a particular gene is thought to directly reflect the phenotype of a loss-of-function mutation in that gene, hence providing evidence as to the gene's biological function.

The goal of RNAi curation is to associate RNAi-mediated phenotypes with the target genes knocked down in RNAi experiments, as found in the literature pertaining to C. elegans and related species. There are some important things to consider when curating RNAi experiments. Some RNAi experiments are simple to curate as the strain is N2 (wild type genotype), only a single gene is targeted for RNAi, the phenotype is clearly stated, and the source and identity of the dsRNA is clearly stated by the authors. In many cases, however, the curation task is not so simple: there may be complex genotypes with multiple mutations in the strain receiving the dsRNA, multiple RNAi gene targets, complex genetic interactions, and/or missing descriptions from the authors as to controls, dsRNA delivery method, dsRNA identity, and/or phenotype directly resulting from the RNAi in question. Hopefully these issues will all be addressed below.

RNAi Data Model

This is the ?RNAi data model as of WormBase Release WS251:

//////////////////////////////////////////////////////////////////
//
// ?RNAi class
//
//////////////////////////////////////////////////////////////////

?RNAi   Evidence #Evidence
        History_name UNIQUE ?Text
        Homol Homol_homol ?Homol_data XREF RNAi_homol ?Method Float Int UNIQUE Int Int UNIQUE Int #Homol_info
        Sequence_info   DNA_text Text UNIQUE Text //stores actual probe sequence for automated mapping
						  // 1st Text is DNA, 2nd is probe name
                       Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment 
                                                     // such as yk clone; not UNIQUE anymore
                       Clone ?Clone XREF Used_in_RNAi       // Chris WS244
                       PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in 
                                                          // the experiment; not UNIQUE anymore
       Uniquely_mapped  //boolean; if present, signifies that ?RNAi object has a unique sequence 
                        // which maps to a single place in the genome
       Experiment      Laboratory ?Laboratory
                       Date UNIQUE DateType
                       Strain UNIQUE ?Strain
                       Genotype UNIQUE ?Text   //used when no Strain object exists
                       Treatment UNIQUE ?Text
                       Life_stage UNIQUE ?Life_stage
                       Temperature UNIQUE Int
                       Delivered_by UNIQUE Bacterial_feeding      //RL [010327]
                                           Injection              //RL [010327]
                                           Soaking                //RL [010327]
                                           Transgene_expression   //RL [010327]
       Inhibits        Predicted_gene ?CDS XREF RNAi_result #Evidence // "gene" parent (unreliable)
                       Gene ?Gene   XREF RNAi_result #Evidence           //RL [010327]
                       Transcript ?Transcript XREF RNAi_result #Evidence // [021126 krb]
                       Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
       Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
       DB_info         Database ?Database ?Database_field ?Accession_number 
       Species         UNIQUE ?Species
       Interaction     ?Interaction
       Reference       UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of 
                                               // equates to a Study class for Will S
       Phenotype       ?Phenotype XREF RNAi #Phenotype_info
       Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info 
       Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
       Remark          ?Text #Evidence
       Method UNIQUE   ?Method


Here is the WS251 ?Phenotype_info model:

////////////////////////////////////////////
//
// ?Phenotype_info Class
//
////////////////////////////////////////////

#Phenotype_info Paper_evidence ?Paper
                Person_evidence ?Person
                Curator_confirmed ?Person
                Remark ?Text #Evidence // specific remarks about the phenotype
                Quantity_description ?Text #Evidence //Remark to describe what quantity describes, below
                Quantity UNIQUE Int UNIQUE Int #Evidence
                Not #Evidence //This is being phased out but is needed for the next phase [06/08/10].
                Penetrance Incomplete Text #Evidence
                           Low Text #Evidence
                           High Text #Evidence
                           Complete Text #Evidence
                           Range UNIQUE Int UNIQUE Int #Evidence // Range of penetrance
                Recessive #Evidence
                Semi_dominant #Evidence
                Dominant #Evidence
                Haplo_insufficient #Evidence
                Caused_by_gene ?Gene #Evidence
                Caused_by_other ?Text #Evidence
                Rescued_by_transgene ?Transgene
                Variation_effect Gain_of_function_undetermined_type #Evidence
                                 Antimorph_gain_of_function #Evidence
                                 Dominant_negative_gain_of_function #Evidence
                                 Hypermorph_gain_of_function #Evidence
                                 Neomorph_gain_of_function #Evidence
                                 Loss_of_function_undetermined_extent #Evidence    
                                 Null #Evidence
                                 Predicted_null_via_sequence #Evidence
                                 Probable_null_via_phenotype #Evidence
                                 Hypomorph_reduction_of_function #Evidence
                                 Predicted_hypomorph_via_sequence #Evidence
                                 Probable_hypomorph_via_phenotype #Evidence
                                 Wild_allele #Evidence
                Affected_by Molecule ?Molecule #Evidence // ?Molecule model Karen Yook
                EQ_annotations Anatomy_term ?Anatomy_term ?PATO_term #Evidence
                               Life_stage ?Life_stage ?PATO_term #Evidence
                               GO_term ?GO_term ?PATO_term #Evidence
                               Molecule_affected  ?Molecule ?PATO_term #Evidence
                Temperature_sensitive Heat_sensitive Text #Evidence
                                      Cold_sensitive Text #Evidence
                Maternal UNIQUE Strictly_maternal #Evidence
                                With_maternal_effect #Evidence
                Paternal #Evidence
                Phenotype_assay Strain ?Strain #Evidence
                                Treatment ?Text #Evidence
                                Temperature ?Text #Evidence
                                Genotype ?Text #Evidence
                Ease_of_scoring UNIQUE ES0_Impossible_to_score #Evidence
                                       ES1_Very_hard_to_score #Evidence
                                       ES2_Difficult_to_score #Evidence
                                       ES3_Easy_to_score #Evidence

Here is the WS251 #Evidence hash model:

////////////////////////////////////////////////////////////////////////////////
//			      Evidence hash
////////////////////////////////////////////////////////////////////////////////

#Evidence Paper_evidence ?Paper                            // Data from a Paper
          Published_as ?Text                               //  .. track other names for the same data
          Person_evidence ?Person                          // Data from a Person
          Author_evidence ?Author UNIQUE Text              // Data from an Author
          Accession_evidence ?Database ?Accession_number   // Data from a database (NDB/UNIPROT etc)
          Protein_id_evidence ?Text                        // Reference a protein_ID
          GO_term_evidence ?GO_term                        // Reference a GO_term
          Expr_pattern_evidence ?Expr_pattern              // Reference a Expression pattern  
          Microarray_results_evidence ?Microarray_results  // Reference a Microarray result
          RNAi_evidence ?RNAi                              // Reference a RNAi knockdown
          CGC_data_submission                              // bless the data as comning from CGC
	  Curator_confirmed ?Person                        // bless the data manually 
	  Inferred_automatically Text                      // bless the data via a script
	  Date_last_updated UNIQUE DateType                // Stores last update timestamp
	  Feature_evidence ?Feature			   // Reference a Feature - eg for creation of isoform based on TEC-RED SL2
	  Laboratory_evidence ?Laboratory                  // Reference a Lab
	  From_analysis ?Analysis			   // Reference an analysis
	  Variation_evidence ?Variation			   // Explicitly record variation from which IMP manual GO annotations are made
	  Mass_spec_evidence ?Mass_spec_peptide
	  Sequence_evidence ?Sequence		           // for sequence data that hasn't been submitted to a public resource
	  Remark ?Text


RNAi annotations in CitaceMinus

These papers have >2000 RNAi objects and, therefore, did not get parsed into postgres tables for RNAi (a total of 59,656 RNAi objects stored in CitaceMinus) :

  • WBPaper00004402 (2287 RNAi objects)
  • WBPaper00004403 (2584 RNAi objects)
  • WBPaper00004651 (2479 RNAi objects)
  • WBPaper00005654 (14253 RNAi objects)
  • WBPaper00006395 (3230 RNAi objects)
  • WBPaper00024497 (10951 RNAi objects)
  • WBPaper00025054 (20709 RNAi objects)
  • WBPaper00029258 (3163 RNAi objects)


RNAi Curation Standard Operating Procedure (SOP)

In order to ensure consistency of RNAi curation across curators and WormBase releases, a set of standard procedures for RNAi curation are outlined below. The descriptions of these procedures include use of the two main methods for generating *.ace files for release submission: (1) the web-based CGI form for one-at-a-time RNAi object generation and (2) the batch form submission method.

Minimum Requirements for an RNAi Object

Regardless of which curation method you (the curator) choose, there are a set of minimum requirements in order to generate a complete RNAi experiment object:

1) A reference (e.g. WBPaperID)

2) A curator name

3) The sequence of the dsRNA used in the experiment to knockdown gene expression

4) A dsRNA delivery method (e.g. Injection) [Note: some authors omit this information. This can be curated arbitrarily and removed from the *.ace file once it has been generated]

5) A phenotype (observed or not_observed)

If any one of these five basic pieces of information are missing, the scripts that generate the *.ace file will fail, returning an error.


RNAi Paper Flagging and Processing of SVM "Low" Results and Tracking

We have been using SVM to flag papers that may contain RNAi data. These papers are given a probability score (High, Medium or Low) based on the SVM training set to reflect the possibility that they contain RNAi data to be curated. The High and Medium scoring papers are automatically added to the list of RNAi papers that are to be curated (on the Paper Editor) and the Lows are stored separately. The Low scoring papers must be manually checked by a curator and if the paper has "curatable" RNAi data, the curator must manually add that paper to the list. The Low scoring paper list is being checked approximately every 3 months (it is added to automatically every Monday at 2am based on SVM results). The following is the SOP for checking these papers and adding them to the list of RNAi papers to curate.

SVM Low's - SOP for checking papers

The file containing the papers with a low priority score (named "low") is stored on tazendra in the directory: /home/postgres/work/pgpopulation/svm/gary_rnai New papers are concatenated to this list. The curator checking the recently added papers must go back in the list to the most recent paper that has a "commented out" mark "//". This mark means that the list has been checked up to this point in the paper list....all papers after this mark need to be manually checked. When the curator is finished, they should "comment out" the last paper on the list so the next curator knows where to begin checking next time.

SVM Low's - SOP for adding papers to curation pipeline

After checking the "Low" papers, the ones that do conatin data must be added to the curation list. To do so go to the curator_first_pass.cgi on the web which is listed under "curation Forms" on the site map for tazendra. Add the appropriate number of the WBPaper id, for example just "00038308" without the WBPaper prefix, and click on the query button. Under "rnai" in the Gene Function section add the following comment in the curator box "SVM -LOW". When you hit "Flag" at the bottom of the page this paper will be added to the RNAi curation list and noted as an SVM with a low probability score.

SVM Low's - SOP for adding papers SVM Tracking

For tracking of SVM results (if False Positive, False Neg etc) a tool was created: http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi This web-based tool will automatically track papers that are positive based on our RNAi curation pipeline/checkout form and the OA; That is if a paper is on the checkout form and is curated, the data will be in the OA and marked as a True Positive. If the paper on the checkout list is a False Positive, the curator must note this on the paper editor (Flag False Positive) and the results will be forwarded to the SVM tracking form. http://tazendra.caltech.edu/~postgres/cgi-bin/paper_editor.cgi The SVM lows that do contain RNAi data are added to the checkout form by a curator (see above) and will be dealt with as any other paper on the list. The SVM lows that actually do not contain RNAi data must be entered manually onto the SVM tracking form as False Positives. To do this go to http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi Under "Enter Curator Results" Select data type/ select curator negative / add the list of negatives to / select comment "SVM positive - Curator negative"


RNAi OA Postgres Tables (rna_*)

CG updated 9-1-2015

postgres tables
rna_anatomy rna_goprocess rna_person_hst
rna_anatomyquality rna_goprocessquality rna_phenotype
rna_child_of rna_heatsens rna_phenotype_hst
rna_child_of_hst rna_heatsens_hst rna_phenotypenot
rna_coldsens rna_historyname rna_phenotypenot_hst
rna_coldsens_hst rna_historyname_hst rna_phenremark
rna_curator rna_laboratory rna_phenremark_hst
rna_curator_hst rna_laboratory_hst rna_quantdesc
rna_database rna_lifestage rna_quantdesc_hst
rna_database_hst rna_lifestage_hst rna_quantfromto
rna_date rna_lifestagequality rna_quantfromto_hst
rna_date_hst rna_molaffected rna_remark
rna_deliverymethod rna_molaffectedquality rna_remark_hst
rna_deliverymethod_hst rna_molecule rna_sequence
rna_dnatext rna_molecule_hst rna_sequence_hst
rna_dnatext_hst rna_movie rna_species
rna_exprprofile rna_movie_hst rna_species_hst
rna_exprprofile_hst rna_name rna_strain
rna_flaggenereg rna_name_hst rna_strain_hst
rna_flaggenereg_hst rna_nodump rna_suggested
rna_flaggeneticintxn rna_nodump_hst rna_suggested_definition
rna_flaggeneticintxn_hst rna_paper rna_suggested_definition_hst
rna_fromgenereg rna_paper_hst rna_suggested_hst
rna_fromgenereg_hst rna_pcrproduct rna_temperature
rna_genotype rna_pcrproduct_hst rna_temperature_hst
rna_genotype_hst rna_penetrance rna_treatment
rna_gocomponent rna_penetrance_hst rna_treatment_hst
rna_gocomponentquality rna_penfromto
rna_gofunction rna_penfromto_hst
rna_gofunctionquality rna_person

RNAi OA

CG updated 9-1-2015

NOTE: Warning: RNAi objects may have multiple lines in the OA due to the nested phenotype data (Penetrance info, etc.)

Dependency Notice: The Community Phenotype Form and the RNAi OA each have separate code to look up the most recent RNAi ID and generate new RNAi IDs based on that. If the code is changed for one, we should also check that the code is changed for the other.


TAB 1

RNAi OA TAB 1 9-1-2015.png

  • pgid - the postgres ID - NOT DUMPED
  • Name - rna_name - RNAi : - WBRNAiID #Note: This is automatically assigned
  • Paper - rna_paper - Reference - ?Paper (Ontology)
  • Curator - rna_curator - NOT DUMPED - Curator (Dropdown)
  • PCR Product - rna_pcrproduct - PCR_product - ?PCR_product (Multi-ontology) #Note: EBI/Hinxton will map these
  • DNA Text - rna_dnatext - DNA_text - Big text #Note: multiple Sequences should be separated by pipe "|" (caution: pressing <ENTER> may cause problems)
  • Strain - rna_strain - Strain - ?Strain (Ontology)
  • Genotype - rna_genotype - Genotype - Big text
  • Treatment - rna_treatment - Treatment - Big text
  • Temperature - rna_temperature - Temperature - Text (Integer)
  • Delivery Method - rna_deliverymethod - Delivered_by - (Multi-dropdown) four choices: "Bacterial_feeding", "Injection", "Soaking", "Transgene_expression"
  • Species - rna_species - Species - ?Species (Dropdown)
  • Remark - rna_remark - Remark - Big text
  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)


TAB 2

RNAi OA TAB 2 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • Phenotype Suggestion - rna_suggested - not dumped - field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
  • Suggested Definition - rna_suggested_definition - NOT DUMPED - Big text field with definition of new suggested phenotype term
  • Child Of - rna_child_of - NOT DUMPED - list of parent phenotype term(s) for suggested phenotype
  • Affected By Molecule - rna_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology)
  • Penetrance From To - rna_penfromto - (in-line with Phenotype) Range - Text (Integer-space-Integer)
  • Penetrance - rna_penetrance - (in-line with Phenotype) Penetrance - dropdown like in phenotype OA
  • Quantity From To - rna_quantfromto - (in-line with Phenotype) Quantity - Text (Integer-space-Integer)
  • Quantity Description - rna_quantdesc - (in-line with Phenotype) Quantity_description - Big text
  • Heat Sensitive - rna_heatsens - (in-line with Phenotype) Temperature_sensitive - Toggle
  • Cold Sensitive - rna_coldsens - (in-line with Phenotype) Temperature_sensitive - Toggle


TAB 3

RNAi OA TAB 3 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • Anatomy - rna_anatomy (Multi-ontology)
  • Anatomy Quality - rna_anatomyquality (Multi-ontology)
  • Life Stage - rna_lifestage - Life_stage - ?Life_stage (Multi-ontology)
  • Life Stage Quality - rna_lifestagequality (Multi-ontology)
  • Molecule Affected - rna_molaffected (Multi-ontology)
  • Mol Aff Quality - rna_molaffectedquality (Multi-ontology)
  • GO Process - rna_goprocess (Multi-ontology)
  • GO P Quality - rna_goprocessquality (Multi-ontology)
  • GO Function - rna_gofunction (Multi-ontology)
  • GO F Quality - rna_gofunctionquality (Multi-ontology)
  • GO Component - rna_gocomponent (Multi-ontology)
  • GO C Quality - rna_gocomponentquality (Multi-ontology)

TAB 4

RNAi OA TAB 4 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • NO DUMP - rna_nodump - NOT DUMPED - toggle - Prevents data from that row from being dumped. This will also indicate to the Curation Status Form (CSF) that the paper has not been curated and will come up as "oa_blank" in the CSF
  • From Genereg - rna_fromgenereg - NOT DUMPED - Toggle
  • Flag Gene Reg - rna_flaggenereg - NOT DUMPED - Toggle
  • Flag Genetic Intxn - rna_flaggeneticintxn - NOT DUMPED - Toggle
  • Person Evidence - rna_person - Evidence Person_evidence - multiontology - for ?RNAi object's #Evidence
  • History Name - rna_historyname - History_name - Text #Note: this field is to accommodate older RNAi objects
  • Movie - rna_movie - Movie - bigtext #Note: this field is to accommodate older RNAi objects; Note: Separate multiple movie entries with bars (|)
  • Database - rna_database - Database - Text #Note: this field is to accommodate older RNAi objects; Enter data in this format, split multiple database lines with pipes if there are any :
    • Phenobank2 Gene&RNAID GeneID=507328 | Phenobank3 Gene&RNAID GeneID=123456
  • Expression Profile - rna_exprprofile - Expr_profile - Text #Note: this field is to accommodate older RNAi objects


NOTES ON SUGGESTING NEW PHENOTYPE TERMS THROUGH THE RNAI OA

  • When suggesting a new phenotype term in TAB 2 of the RNAi OA, curators should make sure to create a PGID/Row with a single placeholder phenotype that is intended to be replaced by the new phenotype term once the new term has been approved
  • If the data is dumped from the OA for upload before the new term has been approved, the object will dump with the placeholder phenotype term for that upload
  • This behavior is similar to that of the Phenotype OA
  • The data entered into the three fields "Phenotype Suggestion", "Suggested Definition", and/or "Child Of" will immediately be populated into the corresponding fields in the New Objects CGI
  • Cron job:
0 3 * * sun /home/acedb/gary/phn_suggested/phn_suggested_oa.pl

Runs every Sunday at 3 am, checking for entries in rna_suggested (and app_suggested) within the last 7 days and sends Gary S. an e-mail if there are any new entries. If there aren't any, there is no email.

NOT USING

  • Evidence - postgres table and field not created, don't know what to parse into this field - Text #Note: this field is to accommodate older RNAi objects So there are only 8 RNAi objects that use the Evidence field, and they do so with 'Person_evidence'. In the OA, I think we should include: Person_evidence, Author_evidence, Curator_confirmed, Laboratory_evidence -- C We're only keeping Person_evidence for the #Evidence hash associated with the whole ?RNAi object -- C+J
  • Penetrance Incomplete - rna_penincomplete - Toggle I think we talked about this, but in case we didn't phenotype OA uses a dropdown for the 4 penetrance types instead of separate toggle fields -- J OK, this sounds like a good idea. Let's just make it a dropdown list here as well -- C
  • Penetrance Low - rna_penlow - Toggle
  • Penetrance High - rna_penhigh - Toggle
  • Penetrance Complete - rna_pencomplete - Toggle
  • Method - rna_method - Method - Text

Notes

Fields for the tags "Predicted_gene", "Gene", "Transcript", "Pseudogene", "Homol_homol" and "Uniquely_mapped" will be omitted from the OA and populated in ACEDB after probe mapping to the genome at the EBI during the build process.

*.ACE Dumper Documentation

Each RNAi object will need to get dumped with "RNAi" in the "Method" tag

On tazendra, perl module is /home/postgres/work/citace_upload/rnai/get_rnai_ace.pm

Run with script /home/postgres/work/citace_upload/rnai/use_package.pl

Generates rnai.ace.<date> and err.out.<date> (to the day, so it will overwrite previous dumps from the same date) please check dumper and .ace file dumped -- J The dumper seems to be working OK since the dumped *.ACE file you generated is reading in OK, except for the "Evidence" line, as I comment about above in the "Tab3" section. The line currently dumps out as "Person_evidence ..." and it needs to dump out as "Evidence Person_evidence ...", otherwise the file throws errors when reading into ACEDB -- C Fixed, I thought only the innermost tag mattered, but I guess because there's no data in the RNAi part, just the tag, it needs to be there before the #Evidence part -- J I believe so, yes. -- C

Use_package.pl line comments:

'my $outfile = 'rnai.ace.' . $date;' 'my $errfile = 'err.out.' . $date;'

  • These lines (above) specify the output; change as necessary.

'my ($all_entry, $long_text, $err_text) = &getRnai('all');'

'# my ($all_entry, $long_text, $err_text) = &getRnai('WBRNAi00008227');'

  • The lines above specify what to dump out of the OA. 'all' dumps everything. If you want to dump specific objects, comment out the top line (add a '#') and uncomment the bottom line (remove '#') and specify the object name where 'WBRNAi00008227' is. Because of permissions, the script will need to be copied, modified, and then run from a permissible directory.

Molecule now converts the pgid stored in postgres with the molecule name in mop_name for that pgid -- J


Error Checks During Dump Process

The following is a list of checks that the .ACE dumper script will perform on all RNAi objects being dumped out of the OA to make sure that the data is consistent and doesn't have any nonsensical information. Any errors that are found will be printed to an "err" file. Every line of the file will list an error message of the general format:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

where "Dump_status" could be "nodump" (for objects that have a 'fatal' error and will not be dumped) or "flagonly" (for objects that have a 'non-fatal' error and will be dumped).


Fatal Errors (RNAi objects will not get dumped)

1) If there is no identifying RNAi probe information (i.e. the sequence or clone used to knockdown the target gene) in an RNAi object, the dumper script will generate an error message that is printed to the ERROR output file and the object will not get dumped. This is determined by checking that:

a) There is at least one "PCR Product" entry OR

b) There is at least one "DNA Text" entry OR

c) There is at least one "Sequence" entry

If none of these conditions hold true, then an error message will be printed in tab-delimited format like this:

12345   nodump    WBPerson1234   There is no sequence, neither pcrproduct nor dnatext nor sequence


2) If there is no reference (Paper or Person) then the object will not get dumped and an error message is printed:

12345   nodump    WBPerson1234   There is no reference, neither paper nor person


3) If there is no "Phenotype" specified by the curator, the object will not get dumped and an error message will print to the ERROR output file like this:

12345   nodump    WBPerson1234   There is no phenotype


4) If there is no RNAi ID, the object will not get dumped and an error message will print to the ERROR output file like this:

12345   nodump    WBPerson1234   There is no RNAi ID


Non-Fatal Errors (RNAi objects will get dumped, but error message will get printed)

1) If there is no dsRNA "Delivery Method" that has been specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:

12345   flagonly    WBPerson1234   There is no deliverymethod


2) If there is no "Species" specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:

12345   flagonly   WBPerson1234   There is no species


3) If no curator is listed for the RNAi object, the RNAi object will get dumped, but an error message will print to the ERROR output file like this:

12345   flagonly   no curator   has no curator