Difference between revisions of "RNAi"

From WormBaseWiki
Jump to navigationJump to search
m
 
(233 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
 +
Archived RNAi documentation may be found [[RNAi/archive | here]]
 +
 
== RNAi Curation Mission Summary ==
 
== RNAi Curation Mission Summary ==
  
Line 7: Line 10:
 
== RNAi Data Model ==
 
== RNAi Data Model ==
  
This is the RNAi data model as of WormBase Release WS227:
+
This is the ?RNAi data model as of WormBase Release WS251:
  
 
  //////////////////////////////////////////////////////////////////
 
  //////////////////////////////////////////////////////////////////
Line 22: Line 25:
 
                         Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment  
 
                         Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment  
 
                                                       // such as yk clone; not UNIQUE anymore
 
                                                       // such as yk clone; not UNIQUE anymore
 +
                        Clone ?Clone XREF Used_in_RNAi      // Chris WS244
 
                         PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in  
 
                         PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in  
 
                                                           // the experiment; not UNIQUE anymore
 
                                                           // the experiment; not UNIQUE anymore
Line 27: Line 31:
 
                         // which maps to a single place in the genome
 
                         // which maps to a single place in the genome
 
         Experiment      Laboratory ?Laboratory
 
         Experiment      Laboratory ?Laboratory
                        Author ?Author
 
 
                         Date UNIQUE DateType
 
                         Date UNIQUE DateType
 
                         Strain UNIQUE ?Strain
 
                         Strain UNIQUE ?Strain
Line 43: Line 46:
 
                         Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
 
                         Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
 
         Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
 
         Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
         DB_info        Database ?Database ?Database_field ?Accession_number //to link out to Phenobank ar2 02-DEC-05
+
         DB_info        Database ?Database ?Database_field ?Accession_number  
                                                                            //removed UNIQUE as reqs multiple connections
 
 
         Species        UNIQUE ?Species
 
         Species        UNIQUE ?Species
        Gene_regulation ?Gene_regulation XREF RNAi  // this tag is used when an RNAi experiment describes
 
                                                    // gene regulation ar2 29-MAR-06 for igor
 
 
         Interaction    ?Interaction
 
         Interaction    ?Interaction
 
         Reference      UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of  
 
         Reference      UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of  
 
                                                 // equates to a Study class for Will S
 
                                                 // equates to a Study class for Will S
 
         Phenotype      ?Phenotype XREF RNAi #Phenotype_info
 
         Phenotype      ?Phenotype XREF RNAi #Phenotype_info
         Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info //added by Wen to separate
+
         Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info  
                                                                          // Not phenotype from real phenotypes
 
 
         Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
 
         Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
 
         Remark          ?Text #Evidence
 
         Remark          ?Text #Evidence
 
         Method UNIQUE  ?Method
 
         Method UNIQUE  ?Method
  
== RNAi Curation Standard Operating Procedure (SOP) ==
 
  
In order to ensure consistency of RNAi curation across curators and WormBase releases, a set of standard procedures for RNAi curation are outlined below. The descriptions of these procedures include use of the two main methods for generating *.ace files for release submission: (1) the web-based CGI form for one-at-a-time RNAi object generation and (2) the batch form submission method.
+
Here is the WS251 ?Phenotype_info model:
  
=== Minimum Requirements for an RNAi Object ===
+
<pre>
 +
////////////////////////////////////////////
 +
//
 +
// ?Phenotype_info Class
 +
//
 +
////////////////////////////////////////////
  
Regardless of which curation method you (the curator) choose, there are a set of minimum requirements in order to generate a complete RNAi experiment object:
+
#Phenotype_info Paper_evidence ?Paper
 +
                Person_evidence ?Person
 +
                Curator_confirmed ?Person
 +
                Remark ?Text #Evidence // specific remarks about the phenotype
 +
                Quantity_description ?Text #Evidence //Remark to describe what quantity describes, below
 +
                Quantity UNIQUE Int UNIQUE Int #Evidence
 +
                Not #Evidence //This is being phased out but is needed for the next phase [06/08/10].
 +
                Penetrance Incomplete Text #Evidence
 +
                          Low Text #Evidence
 +
                          High Text #Evidence
 +
                          Complete Text #Evidence
 +
                          Range UNIQUE Int UNIQUE Int #Evidence // Range of penetrance
 +
                Recessive #Evidence
 +
                Semi_dominant #Evidence
 +
                Dominant #Evidence
 +
                Haplo_insufficient #Evidence
 +
                Caused_by_gene ?Gene #Evidence
 +
                Caused_by_other ?Text #Evidence
 +
                Rescued_by_transgene ?Transgene
 +
                Variation_effect Gain_of_function_undetermined_type #Evidence
 +
                                Antimorph_gain_of_function #Evidence
 +
                                Dominant_negative_gain_of_function #Evidence
 +
                                Hypermorph_gain_of_function #Evidence
 +
                                Neomorph_gain_of_function #Evidence
 +
                                Loss_of_function_undetermined_extent #Evidence   
 +
                                Null #Evidence
 +
                                Predicted_null_via_sequence #Evidence
 +
                                Probable_null_via_phenotype #Evidence
 +
                                Hypomorph_reduction_of_function #Evidence
 +
                                Predicted_hypomorph_via_sequence #Evidence
 +
                                Probable_hypomorph_via_phenotype #Evidence
 +
                                Wild_allele #Evidence
 +
                Affected_by Molecule ?Molecule #Evidence // ?Molecule model Karen Yook
 +
                EQ_annotations Anatomy_term ?Anatomy_term ?PATO_term #Evidence
 +
                              Life_stage ?Life_stage ?PATO_term #Evidence
 +
                              GO_term ?GO_term ?PATO_term #Evidence
 +
                              Molecule_affected  ?Molecule ?PATO_term #Evidence
 +
                Temperature_sensitive Heat_sensitive Text #Evidence
 +
                                      Cold_sensitive Text #Evidence
 +
                Maternal UNIQUE Strictly_maternal #Evidence
 +
                                With_maternal_effect #Evidence
 +
                Paternal #Evidence
 +
                Phenotype_assay Strain ?Strain #Evidence
 +
                                Treatment ?Text #Evidence
 +
                                Temperature ?Text #Evidence
 +
                                Genotype ?Text #Evidence
 +
                Ease_of_scoring UNIQUE ES0_Impossible_to_score #Evidence
 +
                                      ES1_Very_hard_to_score #Evidence
 +
                                      ES2_Difficult_to_score #Evidence
 +
                                      ES3_Easy_to_score #Evidence
  
1) A reference (e.g. WBPaperID)
+
</pre>
  
2) A curator name
+
Here is the WS251 #Evidence hash model:
  
3) The sequence of the dsRNA used in the experiment to knockdown gene expression
+
<pre>
 +
////////////////////////////////////////////////////////////////////////////////
 +
//       Evidence hash
 +
////////////////////////////////////////////////////////////////////////////////
  
4) A dsRNA delivery method (e.g. Injection) [Note: some authors omit this information. This can be curated arbitrarily and removed from the *.ace file once it has been generated]
+
#Evidence Paper_evidence ?Paper                            // Data from a Paper
 +
          Published_as ?Text                              //  .. track other names for the same data
 +
          Person_evidence ?Person                          // Data from a Person
 +
          Author_evidence ?Author UNIQUE Text              // Data from an Author
 +
          Accession_evidence ?Database ?Accession_number  // Data from a database (NDB/UNIPROT etc)
 +
          Protein_id_evidence ?Text                        // Reference a protein_ID
 +
          GO_term_evidence ?GO_term                        // Reference a GO_term
 +
          Expr_pattern_evidence ?Expr_pattern              // Reference a Expression pattern 
 +
          Microarray_results_evidence ?Microarray_results  // Reference a Microarray result
 +
          RNAi_evidence ?RNAi                              // Reference a RNAi knockdown
 +
          CGC_data_submission                              // bless the data as comning from CGC
 +
  Curator_confirmed ?Person                        // bless the data manually
 +
  Inferred_automatically Text                      // bless the data via a script
 +
  Date_last_updated UNIQUE DateType                // Stores last update timestamp
 +
  Feature_evidence ?Feature   // Reference a Feature - eg for creation of isoform based on TEC-RED SL2
 +
  Laboratory_evidence ?Laboratory                  // Reference a Lab
 +
  From_analysis ?Analysis   // Reference an analysis
 +
  Variation_evidence ?Variation   // Explicitly record variation from which IMP manual GO annotations are made
 +
  Mass_spec_evidence ?Mass_spec_peptide
 +
  Sequence_evidence ?Sequence           // for sequence data that hasn't been submitted to a public resource
 +
  Remark ?Text
 +
</pre>
  
5) A phenotype (observed or not_observed)
 
  
If any one of these five basic pieces of information are missing, the scripts that generate the *.ace file will fail, returning an error.
+
== RNAi annotations in CitaceMinus ==
  
=== Two Methods of Curation: Web CGI & Batch Form ===
+
These papers have >2000 RNAi objects and, therefore, did not get parsed into postgres tables for RNAi (a total of 59,656 RNAi objects stored in CitaceMinus) :  
 +
* WBPaper00004402 (2287 RNAi objects)
 +
* WBPaper00004403 (2584 RNAi objects)
 +
* WBPaper00004651 (2479 RNAi objects)
 +
* WBPaper00005654 (14253 RNAi objects)
 +
* WBPaper00006395 (3230 RNAi objects)
 +
* WBPaper00024497 (10951 RNAi objects)
 +
* WBPaper00025054 (20709 RNAi objects)
 +
* WBPaper00029258 (3163 RNAi objects)
  
There are two basic methods of RNAi curation: (1) a web-based CGI form and (2) a batch form involving a Perl script (living on the elbrus@caltech machine) to convert a spreadsheet (tab-delimited file) into an *.ace file.
 
  
==== (1) Web-based CGI ====
+
== RNAi Curation Standard Operating Procedure (SOP) ==
  
The web-based RNAi curation form can be found at:
+
In order to ensure consistency of RNAi curation across curators and WormBase releases, a set of standard procedures for RNAi curation are outlined below. The descriptions of these procedures include use of the two main methods for generating *.ace files for release submission: (1) the web-based CGI form for one-at-a-time RNAi object generation and (2) the batch form submission method.
  
http://elbrus.caltech.edu/cgi-bin/igor/rnaitools/rnai_curation
+
=== Minimum Requirements for an RNAi Object ===
  
===== Web Form Page One =====
+
Regardless of which curation method you (the curator) choose, there are a set of minimum requirements in order to generate a complete RNAi experiment object:
  
The first page of the web form requests information about the publication and looks like this:
+
1) A reference (e.g. WBPaperID)
  
----
+
2) A curator name
  
[[File:RNAi_CGI_1.png]]
+
3) The sequence of the dsRNA used in the experiment to knockdown gene expression
  
----
+
4) A dsRNA delivery method (e.g. Injection) [Note: some authors omit this information. This can be curated arbitrarily and removed from the *.ace file once it has been generated]
  
This page requests information about the paper (e.g. WBPaperID), laboratory two-letter code (usually senior/corresponding author; for a list of all labs' two-letter codes look here: http://www.wormbase.org/db/misc/laboratory?name=*;class=laboratory), date of publication (accepted date), and a curator name. The laboratory and publication acceptance date are optional.
+
5) A phenotype (observed or not_observed)
  
Once you've input the relevant information, click on the "Submit" button. Note that this may take a few moments to load. Once loaded you will be at the second page of the web form.
+
If any one of these five basic pieces of information are missing, the scripts that generate the *.ace file will fail, returning an error.
  
===== Web Form Page Two =====
 
  
The second page of the web form looks like this:
+
== RNAi Paper Flagging and Processing of SVM "Low" Results and Tracking ==
  
 +
We have been using SVM to flag papers that may contain RNAi data.  These papers are given a probability score (High, Medium or Low) based on the SVM training set to reflect the possibility that they contain RNAi data to be curated.  The High and Medium scoring papers are automatically added to the list of RNAi papers that are to be curated (on the Paper Editor) and the Lows are stored separately.  The Low scoring papers must be manually checked by a curator and if the paper has "curatable" RNAi data, the curator must manually add that paper to the list.  The Low scoring paper list is being checked approximately every 3 months (it is added to automatically every Monday at 2am based on SVM results). The following is the SOP for checking these papers and adding them to the list of RNAi papers to curate.
  
----
+
=== SVM Low's - SOP for checking papers ===
  
[[File:RNAi_CGI_2.png]]
+
The file containing the papers with a low priority score (named "low") is stored on tazendra in the directory:
 +
/home/postgres/work/pgpopulation/svm/gary_rnai
 +
New papers are concatenated to this list.  The curator checking the recently added papers must go back in the list to the most recent paper that has a "commented out" mark "//".  This mark means that the list has been checked up to this point in the paper list....all papers after this mark need to be manually checked.  When the curator is finished, they should "comment out" the last paper on the list so the next curator knows where to begin checking next time.
  
 +
=== SVM Low's - SOP for adding papers to curation pipeline ===
  
----
+
After checking the "Low" papers, the ones that do conatin data must be added to the curation list. To do so go to the curator_first_pass.cgi on the web which is listed under "curation Forms" on the site map for tazendra. Add the appropriate number of the WBPaper id, for example just "00038308" without the WBPaper prefix, and click on the query button. Under "rnai" in the Gene Function section add the following comment in the curator box "SVM -LOW". When you hit "Flag" at the bottom of the page this paper will be added to the RNAi curation list and noted as an SVM with a low probability score.
 
 
This second page of the web form requests information about the RNAi probe(s) (i.e. the dsRNA sequence used to knockdown expression of the target gene). This information may be submitted in the form of a PCR product object (e.g. sjj_ZK617.1 or mv_CAA33463), PCR primers (it's usually good practice to check the primer sequences using e-PCR to confirm that they work properly), genomic coordinates, cDNA/EST/OST clone (note that many of these clones may be problematic; in such a case it is best to determine the sequence to the best of your ability and submit that instead), or actual sequence (e.g. "ACGT") of the probe.
 
 
 
An important point here is that RNAi target genes for an RNAi experiment are not submitted directly using the name of the target gene as indicated in the publication. The reason for this is that gene names can be continuously remapped to different regions of the genome over time, making persistence of the data impossible if we were to use only gene names to identify the target gene. By providing the actual sequence or clone used in the experiment, we can ensure that the appropriate gene is identified as the RNAi target gene regardless of which release of the database you are currently working in.
 
 
 
You may wish to submit more than one RNAi probe (e.g. when two or more genes are targeted by RNAi simultaneously). If this is the case, be sure to select "Yes" after the "Add another probe?" question at the bottom of the page, and then click on the "Submit" button.
 
 
 
Once all of your probes have been submitted, make sure that "No" is selected after the "Add another probe?" question, and click on the "Submit" button.
 
 
 
If all of the probes check out OK, you will be brought to the third page of the web form.
 
 
 
===== Web Form Page Three =====
 
 
 
The third page of the web form looks like this:
 
 
 
----
 
 
 
[[File:RNAi_CGI_3.1.png]]
 
 
 
[[File:RNAi_CGI_3.2.png]]
 
 
 
----
 
 
 
This page requests all of the experimental details about the RNAi experiment in question. This information includes Strain/Genotype, treatment conditions, life stage, dsRNA delivery method (e.g. bacterial feeding), Phenotype observed (or not_observed), Gene Regulation objects, Genetic Interaction objects, species, and general remarks about the experiment.
 
 
 
'''Strain/Genotype'''
 
 
 
Whereas the "Genotype" information for a strain used in an RNAi experiment may be written in as free-text, the "Strain" field will only accept strains that are officially recognized by WormBase. Be sure to check the database for the existence of your strain name before submitting (otherwise you may receive an error upon submission). If your strain name does not currently exist in the database, you may write in the genotype in the "Genotype" field, or request the addition of the strain into the database.
 
 
 
'''Treatment/Temperature'''
 
 
 
The "Treatment" field is a free-text field in which the curator may specify any particular experimental conditions that apply to this RNAi experiment. This may include details about growth conditions, dsRNA delivery, or specimen manipulation. The "Temperature" field must be filled in with an integer value indicating the temperature in degrees Celsius (e.g. "25" for 25 degrees Celsius).
 
 
 
'''Life Stage'''
 
 
 
The "Life Stage" field must be filled in with an official life stage name, as stored in ACeDB/WormBase. For an official list of Life Stage names in WormBase, see:
 
 
 
[[UserGuide:Life_Stage_System_in_WormBase|Life_Stage_System_in_WormBase]]
 
 
 
or
 
 
 
[[UserGuide:Definitions_for_Life_Stages|Definitions_for_Life_Stages]]
 
 
 
'''Delivered by'''
 
 
 
The "Delivered by" field provides a drop-down menu of four dsRNA delivery methods: Injection, Bacterial feeding, Soaking, and Transgene expression.
 
 
 
'''WormBase phenotype ID'''
 
 
 
This field needs to have at least one phenotype in the form of a WormBase phenotype ID, for example: WBPhenotype:0000050 (for embryonic lethality). Without at least one phenotype, the *.ace-generating script will throw an error.
 
 
 
If you need to add other information pertaining to the phenotype, such as penetrance, quantification, or a remark, click on the "Phenotype Info" button to the right. This will open up a separate window that looks like this:
 
 
 
[[File:RNAI_Phenotype_Info.png]]
 
 
 
This form provides fields for the curator to enter information regarding the penetrance, temperature sensitivity, quantification, molecule interactions, phenotypes-not-observed, and a general, free-text remark field.
 
 
 
The '''Penetrance''' section records the range of penetrance of the phenotype in question (indicated to the left). The curator can provide a lower limit ("from") and/or an upper limit ("to"); if there is only one penetrance value indicated in the publication, just fill it in the "from" field. This value is an integer value representing percent penetrance. Below the "Range" fields, the curator can toggle the check boxes next to "Incomplete", "Low", "High", or "Complete" indicating the general quantity of penetrance. Each field also provides a free-text "Description" field to provide a description of the penetrance.
 
 
 
The '''Temperature sensitive''' fields allow the curator to indicate whether or not the phenotype is dependent on temperature and free-text "Description" field to provide details.
 
 
 
The '''Quantity''' fields record the lower and/or upper limits of the phenotype quantification as integer values on a scale that can be described in the "Description" free-text field. An example of a phenotype quantity might be "average brood size" or "body bends per minute".
 
 
 
The '''Affected by''' field captures molecules/compounds/drugs that affect the phenotype as indicated by this RNAi experiment. The molecule must be entered as the official WormBase molecule ID, usually the MeSH ID (e.g. D003609).
 
 
 
The '''NOT''' field provides a toggle check box for indicating if the phenotype in question was NOT OBSERVED in the experiment. This captures negative results of RNAi experiments.
 
 
 
Finally, the '''Details''' section provides a free-text field to provide descriptions of the phenotype beyond which could be captured in the other fields.
 
 
 
 
 
 
 
'''Additional phenotype information...'''
 
 
 
OBSOLETE;  If you cannot find an appropriate phenotype for your RNAi experiment in the Worm Phenotype Ontology (WPO; worm_phenotype.obo can be downloaded from http://www.obofoundry.org/), please visit (http://tazendra.caltech.edu/~postgres/cgi-bin/new_objects.cgi) and enter the phenotype term you want to request by clicking the “Update Phenotype!” button and filling out the fields under “Suggest data through this CGI”.
 
 
 
'''New Gene Regulation'''
 
 
 
If the RNAi experimental result suggests that the RNAi target gene is somehow functionally involved in the regulation of a gene (at the transcriptional, translational, or post-translational level) and no Gene Regulation object exists for this relationship in the database, check the box here to request that a Gene Regulation object ID be assigned to this RNAi experiment. If the curator checks the box, a separate browser window will open with a simple text field to enter in the details of the Gene Regulation event:
 
 
 
[[File:RNAi_Gene_Regulation_Info.png]]
 
 
 
As the note suggests, don't forget to send this info to Xiaodong so she can generate a new Gene Regulation Object.
 
 
 
'''Gene Regulation Object'''
 
 
 
If a Gene Regulation Object ID has already been assigned to this experiment from a different curation pipeline, enter the Gene Regulation Object ID here (e.g. WBPaper00006370_lin-3, cgc6303_lag-2).
 
 
 
'''New Interaction'''
 
 
 
If the RNAi experiment indicates a genetic (or any other type) interaction and an Interaction Object ID has not yet been assigned to this interaction, select the number of genes involved in the interaction from the provided pull down menu. This will automatically open another browser window which will provide a form for entering details specific to that particular interaction (see image below):
 
 
 
[[File:RNAi_Interaction_Info.png]]
 
 
 
This form provides the curator with fields to enter information for each interacting gene as well as general information about the interaction. The gene-specific information includes the gene's WBGeneID (e.g. WBGene00002299), "Variation" information (e.g. e1370, mgDf47) if this gene was perturbed by an allele or other variation, "Transgene" for a transgene harboring the gene (or variation of the gene) that contributes to the interaction, "This RNAi" toggle box to indicate that the perturbation of this gene was via RNAi (as opposed to an allele), and "Direction" drop-down menu to indicate if the gene is the "Effector" or "Effected" gene (for directional interactions) or "Non-directional" (for interactions in which directionality is irrelevant).
 
 
 
Below the gene-specific fields is a drop-down menu for indicating the "Interaction Type" (e.g. Genetic, Suppression, Enhancement), a "Interaction-relevant Phenotype" field to indicate the phenotype about which the interaction is based, and a free-text "Remark" field to describe any details of the interaction not sufficiently explained by the other fields.
 
 
 
'''Interaction Object'''
 
 
 
If an Interaction Object ID has already been assigned to this experiment from a different curation pipeline, enter the Interaction Object ID here (e.g. WBInteraction0001295).
 
 
 
'''Picture'''
 
 
 
OBSOLETE; The current RNAi data model no longer supports Picture objects
 
 
 
'''Species'''
 
 
 
Enter the species in which the RNAi experiment is taking place. ''Caenorhabditis elegans'' is set by default.
 
 
 
'''Remark'''
 
 
 
The "Remark" field is a free-text field to enter in any remaining relevant information that is pertinent to the RNAi experiment, but has not yet been captured by any other field of the curation form. This is often where standard remarks regarding the ambiguity of the dsRNA identity (see below) are entered. Information relevant to a genetic interaction, gene regulation event, or complex phenotype may also be put here.
 
 
 
If you are done curating this particular RNAi experiment and have more from the current paper, select "Yes" next to the "Add another experiment from this paper?" question at the bottom of the page and then click "Submit". If you are done curating RNAi experiments from this paper (or are done for the current session) and would like to generate an *.ace file, select "No" next to "Add another experiment from this paper?". Finally, indicate whether or not you would like to receive an e-mail of the *.ace file (as text in the body of the e-mail) next to "Would you like to receive a copy of the ace file via e-mail?". Once you are set, click the "Submit" button to generate the ace file (to be displayed in the web browser). To generate the *.ace file, you may simply copy and paste the text from the screen into a plain text file and save it with the ".ace" extension.
 
 
 
===== Post-processing your *.ace file =====
 
 
 
The .ace files from the CGI form need to be concatenated and processed using the following script (rnai_ace_modify.pl).  Talk to a current RNAi curator for a copy of this script.  This script converts the “NOT” tag to a “phenotype_not_observed” tag to be compliant with the current model and it splits the molecule data and interaction data into separate files: Hence, this script will take the .ace file from the CGI form and split it into 3 parts:
 
 
 
(1) FILENAME.ace.new -- for RNAi data
 
 
 
(2) FILENAME.ace.mol -- for molecule data
 
 
 
(3) FILENAME.ace.interaction - for interaction data
 
 
 
To run this script, type ./rnai_ace_modify.pl
 
 
 
It will ask you for the input FILENAME from the .ace file that you want to convert into these 3 files. 
 
 
 
The following files get uploaded to CitaceMinus:
 
 
 
(1) FILENAME.ace.new -- for RNAi data
 
 
 
(2) FILENAME.ace.mol -- for molecule data
 
 
 
and the interaction file “FILENAME.ace.interaction” needs to be parsed into the interaction OA. (see the [[Gene_Interaction#upload_Gary_and_Chris_RNAi-based_interaction_objects_into_OA|interaction Wiki page]] for instructions).
 
 
 
==== (2) Batch Form ====
 
 
 
The second basic method for generating RNAi *.ace files involves filling out all relevant RNAi curation information on a spreadsheet with each row (more or less; explanation below) as an RNAi experiment and each column as an RNAi object field. An Excel-based template file may be found here:
 
 
 
[[File:RNAi_Curation_Template.xls]]
 
 
 
Much of what applies to the RNAi Curation CGI Form described above applies similarly (or exactly) to the Batch Form. There are a few spreadsheet-specific changes that we will describe here.
 
 
 
First, there are a number of columns that are highlighted in yellow and have the comment "remove column before submission" in parentheses. These are temporary aids to the curator that must be removed before submitting the spreadsheet to the script, as they will cause an error.
 
 
 
 
 
===== Filling out the Batch Form =====
 
 
 
Shown below are screenshots from a fictional RNAi experiment entry in the batch form.
 
 
 
'''Paper Identification Information'''
 
 
 
[[File:RNAi_Batch_Form_1.png|Curation Template]]
 
  
 +
=== SVM Low's - SOP for adding papers SVM Tracking ===
  
The first 6 columns (A-F) of the spreadsheet record WBRNAi ID, WBPaper ID (WBPaper000#####), Laboratory (2-letter code), Publication Acceptance Date (YYYY-MM-DD), Curator Name, and Curator E-Mail. The WBRNAi ID column is only used once the *.ace file has already been made and is removed before submission to the *.ace-file-generating script. Whereas the Laboratory code and Publication Acceptance Date are optional, the WBPaper ID and the Curator Name are required.
+
For tracking of SVM results (if False Positive, False Neg etc) a tool was created:
 +
http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi
 +
This web-based tool will automatically track papers that are positive based on our RNAi curation pipeline/checkout form and the OA; That is if a paper is on the checkout form and is curated, the data will be in the OA and marked as a True Positive. If the paper on the checkout list is a False Positive, the curator must note this on the paper editor (Flag False Positive) and the results will be forwarded to the SVM tracking form.
 +
http://tazendra.caltech.edu/~postgres/cgi-bin/paper_editor.cgi
 +
The SVM lows that do contain RNAi data are added to the checkout form by a curator (see above) and will be dealt with as any other paper on the list.  The SVM lows that actually do not contain RNAi data must be entered manually onto the SVM tracking form as False Positives.  To do this go to
 +
http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi
 +
Under "Enter Curator Results"
 +
Select data type/ select curator negative / add the list of negatives to / select comment "SVM positive - Curator negative"
  
  
'''RNAi probe (dsRNA source) Information'''
+
== RNAi OA Postgres Tables (rna_*) ==
  
[[File:RNAi_Batch_Form_2.png]]
+
CG updated 9-1-2015
  
 +
{|border="1"
 +
|'''postgres tables'''||
 +
|-
 +
|rna_anatomy            ||rna_goprocess            ||rna_person_hst
 +
|-
 +
|rna_anatomyquality            ||rna_goprocessquality            ||rna_phenotype
 +
|-
 +
|rna_child_of            ||rna_heatsens            ||rna_phenotype_hst
 +
|-
 +
|rna_child_of_hst            ||rna_heatsens_hst            ||rna_phenotypenot
 +
|-
 +
|rna_coldsens            ||rna_historyname            ||rna_phenotypenot_hst
 +
|-
 +
|rna_coldsens_hst            ||rna_historyname_hst            ||rna_phenremark
 +
|-
 +
|rna_curator            ||rna_laboratory            ||rna_phenremark_hst
 +
|-
 +
|rna_curator_hst            ||rna_laboratory_hst            ||rna_quantdesc
 +
|-
 +
|rna_database            ||rna_lifestage            ||rna_quantdesc_hst
 +
|-
 +
|rna_database_hst            ||rna_lifestage_hst            ||rna_quantfromto
 +
|-
 +
|rna_date            ||rna_lifestagequality            ||rna_quantfromto_hst
 +
|-
 +
|rna_date_hst            ||rna_molaffected            ||rna_remark
 +
|-
 +
|rna_deliverymethod            ||rna_molaffectedquality            ||rna_remark_hst
 +
|-
 +
|rna_deliverymethod_hst            ||rna_molecule            ||rna_sequence
 +
|-
 +
|rna_dnatext            ||rna_molecule_hst            ||rna_sequence_hst
 +
|-
 +
|rna_dnatext_hst            ||rna_movie            ||rna_species
 +
|-
 +
|rna_exprprofile            ||rna_movie_hst            ||rna_species_hst
 +
|-
 +
|rna_exprprofile_hst            ||rna_name            ||rna_strain
 +
|-
 +
|rna_flaggenereg            ||rna_name_hst            ||rna_strain_hst
 +
|-
 +
|rna_flaggenereg_hst            ||rna_nodump            ||rna_suggested
 +
|-
 +
|rna_flaggeneticintxn            ||rna_nodump_hst            ||rna_suggested_definition
 +
|-
 +
|rna_flaggeneticintxn_hst            ||rna_paper            ||rna_suggested_definition_hst
 +
|-
 +
|rna_fromgenereg            ||rna_paper_hst            ||rna_suggested_hst
 +
|-
 +
|rna_fromgenereg_hst            ||rna_pcrproduct            ||rna_temperature
 +
|-
 +
|rna_genotype            ||rna_pcrproduct_hst            ||rna_temperature_hst
 +
|-
 +
|rna_genotype_hst            ||rna_penetrance            ||rna_treatment
 +
|-
 +
|rna_gocomponent            ||rna_penetrance_hst            ||rna_treatment_hst
 +
|-
 +
|rna_gocomponentquality              ||rna_penfromto            ||
 +
|-
 +
|rna_gofunction            ||rna_penfromto_hst            ||
 +
|-
 +
|rna_gofunctionquality            ||rna_person            ||
 +
|}
  
The next 8 columns (G-N) record RNAi probe information (i.e. information about the dsRNA identity/sequence). The first column (G) is an exception. The "Same As Above" column either takes no entry or "YES" (all CAPS important), available from an Excel pull down menu; this column indicates whether the row/line of the form is to be included as part of the same RNAi experiment as the line(s) before it (the closest preceding line with paper information provided). This is necessary, for example, when submitting more than one RNAi probe, as each RNAi probe must go on its own line. It is important that information that is unique to an RNAi experiment (e.g. Strain/Genotype, Treatment, Temperature, etc.; indicated by RED text) is not also added to this line as it will overwrite any information provided in lines above it (belonging to the same RNAi experiment).
+
== RNAi OA ==
  
The "PCR Product", "Primer1", "Primer2", "Genomic Coordinates", "Clone", and "Sequence" columns record the dsRNA source as in the CGI Web Form (see above). The "Target ID" column (yellow, hence to be removed before submission) is free-text to indicate which gene the RNAi probe is targeting. This column is handy for keeping track of which RNAi experiments have already been curated, as there is no other easy way to see the gene's name in the form.
+
CG updated 9-1-2015
  
 +
NOTE: Warning: RNAi objects may have multiple lines in the OA due to the nested phenotype data (Penetrance info, etc.)
  
'''Experimental Information'''
+
'''Dependency Notice''': The Community Phenotype Form and the RNAi OA each have separate code to look up the most recent RNAi ID and generate new RNAi IDs based on that. If the code is changed for one, we should also check that the code is changed for the other.
  
[[File:RNAi_Batch_Form_3.png]]
 
  
 +
=== TAB 1 ===
  
The next 8 columns (O-V) record experimental details including "Strain", "Genotype", "Treatment", "Life Stage", "Temperature", "Delivered by", "Species", and "Remark". The "Life Stage" and "Delivered by" columns provide Excel pull down menus to select terms from a standard list (syntax is important). Note that these fields are the ones that are unique to an RNAi experiment; if two or more rows contain information for one of these columns, only the bottom-most row will be read. The formats for the remaining columns are as indicated in the CGI Web Form section above (i.e. "Genotype", "Treatment", and "Remark" are free-text, "Strain" must be a recognized WormBase strain name, "Temperature" must be an integer value for degrees Celsius, and "Species" must be a recognized Latin species name of a nematode species in WormBase).
+
[[File:RNAi_OA_TAB_1_9-1-2015.png|600px]]
  
 +
*pgid - the postgres ID - NOT DUMPED
 +
*Name - rna_name - RNAi : - WBRNAiID #Note: This is automatically assigned
 +
*Paper - rna_paper - Reference - ?Paper (Ontology)
 +
*Curator - rna_curator - NOT DUMPED - Curator (Dropdown)
 +
*PCR Product - rna_pcrproduct - PCR_product - ?PCR_product (Multi-ontology) #Note: EBI/Hinxton will map these
 +
*DNA Text - rna_dnatext - DNA_text - Big text #Note: multiple Sequences should be separated by pipe "|" (caution: pressing <ENTER> may cause problems)
 +
*Strain - rna_strain - Strain - ?Strain (Ontology)
 +
*Genotype - rna_genotype - Genotype - Big text
 +
*Treatment - rna_treatment - Treatment - Big text
 +
*Temperature - rna_temperature - Temperature - Text (Integer)
 +
*Delivery Method - rna_deliverymethod - Delivered_by - (Multi-dropdown) four choices: "Bacterial_feeding", "Injection", "Soaking", "Transgene_expression"
 +
*Species - rna_species - Species - ?Species (Dropdown)
 +
*Remark - rna_remark - Remark - Big text
 +
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  
  
'''Phenotype Information 1'''
+
=== TAB 2 ===
  
[[File:RNAi_Batch_Form_4.png]]
+
[[File:RNAi_OA_TAB_2_9-1-2015.png|600px]]
  
 +
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*Phenotype Suggestion - rna_suggested - not dumped - field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
 +
*Suggested Definition - rna_suggested_definition - NOT DUMPED - Big text field with definition of new suggested phenotype term
 +
*Child Of - rna_child_of - NOT DUMPED - list of parent phenotype term(s) for suggested phenotype
 +
*Affected By Molecule - rna_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology)
 +
*Penetrance From To - rna_penfromto - (in-line with Phenotype) Range - Text (Integer-space-Integer)
 +
*Penetrance - rna_penetrance - (in-line with Phenotype) Penetrance - dropdown like in phenotype OA
 +
*Quantity From To - rna_quantfromto - (in-line with Phenotype) Quantity - Text (Integer-space-Integer)
 +
*Quantity Description - rna_quantdesc - (in-line with Phenotype) Quantity_description - Big text
 +
*Heat Sensitive - rna_heatsens - (in-line with Phenotype) Temperature_sensitive - Toggle
 +
*Cold Sensitive - rna_coldsens - (in-line with Phenotype) Temperature_sensitive - Toggle
  
'''Phenotype Information 2'''
 
  
[[File:RNAi_Batch_Form_5.png]]
+
=== TAB 3 ===
  
 +
[[File:RNAi_OA_TAB_3_9-1-2015.png|600px]]
  
The next 17 columns (W-AM) record phenotype information. The "Gene Regulation ID" records a Gene Regulation Object ID, if one has already been generated for this experiment. The "Phenotype ID" must be given as the WormBase Phenotype ID as recorded in the Worm Phenotype Ontology (WPO; worm_phenotype.obo is available at http://www.obofoundry.org/), for example, WBPhenotype:0000081 for "L1 arrest". The yellow "Phenotype Name" column records the human-readable name of the phenotype for curatorial convenience.
+
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*Anatomy - rna_anatomy (Multi-ontology)
 +
*Anatomy Quality - rna_anatomyquality (Multi-ontology)
 +
*Life Stage - rna_lifestage - Life_stage - ?Life_stage (Multi-ontology)
 +
*Life Stage Quality - rna_lifestagequality (Multi-ontology)
 +
*Molecule Affected - rna_molaffected (Multi-ontology)
 +
*Mol Aff Quality - rna_molaffectedquality (Multi-ontology)
 +
*GO Process - rna_goprocess (Multi-ontology)
 +
*GO P Quality - rna_goprocessquality (Multi-ontology)
 +
*GO Function - rna_gofunction (Multi-ontology)
 +
*GO F Quality - rna_gofunctionquality (Multi-ontology)
 +
*GO Component - rna_gocomponent (Multi-ontology)
 +
*GO C Quality - rna_gocomponentquality (Multi-ontology)
  
The following columns (Z-AE) record the penetrance of the phenotype: "Penetrance From" records a lower limit of penetrance (integer of percentage), "Penetrance To" records an upper limit of penetrance (integer of percentage), and "Penetrancs(sic) Incomplete", "Penetrance Low", "Penetrance High", and "Penetrance Complete" are record ("YES" or empty; pull down menu) whether or not the penetrance quantification falls under any of these categories. "Heat Sensitive" and "Cold Sensitive" indicate ("YES" or empty; pull down menu) whether or not the phenotype is heat or cold sensitive, respectively.  "Quantity From" and "Quantity To" capture the lower and/or upper limits of the phenotype quantification (integer values), if one is provided. "Quantity Description" is a free-text field for describing the nature of the phenotype quantitation. The "NOT" field would be filled with a "YES" if the phenotype was not observed (negative result). "Phenotype Remark" is a free-text field for describing any aspects of the phenotype not adequately captured by the other fields. The yellow "Evidence" field (optional and must be removed before submission) provides the curator with a field to capture the source of the evidence within the paper (e.g. Figures, Tables, Text, etc.).
+
=== TAB 4 ===
  
 +
[[File:RNAi_OA_TAB_4_9-1-2015.png|600px]]
  
'''Molecule Information'''
+
*Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
 +
*NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
 +
*Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
 +
*NO DUMP - rna_nodump - NOT DUMPED - toggle - Prevents data from that row from being dumped. This will also indicate to the Curation Status Form (CSF) that the paper has not been curated and will come up as "oa_blank" in the CSF
 +
*From Genereg - rna_fromgenereg - NOT DUMPED - Toggle
 +
*Flag Gene Reg - rna_flaggenereg - NOT DUMPED - Toggle
 +
*Flag Genetic Intxn - rna_flaggeneticintxn - NOT DUMPED - Toggle
 +
*Person Evidence - rna_person - Evidence Person_evidence - multiontology - for ?RNAi object's #Evidence
 +
*History Name - rna_historyname - History_name - Text      #Note: this field is to accommodate older RNAi objects
 +
*Movie - rna_movie - Movie - bigtext      #Note: this field is to accommodate older RNAi objects; Note: Separate multiple movie entries with bars (|)
 +
*Database - rna_database - Database - Text            #Note: this field is to accommodate older RNAi objects; Enter data in this format, split multiple database lines with pipes if there are any :
 +
**Phenobank2 Gene&RNAID GeneID=507328 | Phenobank3 Gene&RNAID GeneID=123456
 +
*Expression Profile - rna_exprprofile - Expr_profile - Text  #Note: this field is to accommodate older RNAi objects
  
[[File:RNAi_Batch_Form_6.png]]
 
  
  
The next two columns (AN and AO) capture molecule information, if applicable. This would apply in circumstances in which a compound/drug was used in the experiment and potentially had biological activity or an affect on the phenotype being reported. The "Molecule" column must be filled with the WormBase recognized ID for a molecule, the MeSH ID by default. The yellow "Molecule Common Name" field should indicate (for curator convenience) the common, human-readable name of the compound.
+
'''NOTES ON SUGGESTING NEW PHENOTYPE TERMS THROUGH THE RNAI OA'''
 +
* When suggesting a new phenotype term in TAB 2 of the RNAi OA, curators should make sure to create a PGID/Row with a '''single''' placeholder phenotype that is intended to be replaced by the new phenotype term once the new term has been approved
 +
* If the data is dumped from the OA for upload before the new term has been approved, the object will dump with the placeholder phenotype term for that upload
 +
* This behavior is similar to that of the Phenotype OA
 +
* The data entered into the three fields "Phenotype Suggestion", "Suggested Definition", and/or "Child Of" will immediately be populated into the corresponding fields in the New Objects CGI
 +
*Cron job:
  
 +
0 3 * * sun /home/acedb/gary/phn_suggested/phn_suggested_oa.pl
  
'''Interaction Information 1'''
+
Runs every Sunday at 3 am, checking for entries in rna_suggested (and app_suggested) within the last 7 days and sends Gary S. an e-mail if there are any new entries.  If there aren't any, there is no email.
  
[[File:RNAi_Batch_Form_7.png]]
+
=== NOT USING ===
 +
* Evidence - postgres table and field not created, don't know what to parse into this field - Text            #Note: this field is to accommodate older RNAi objects ''So there are only 8 RNAi objects that use the Evidence field, and they do so with 'Person_evidence'. In the OA, I think we should include: Person_evidence, Author_evidence, Curator_confirmed, Laboratory_evidence -- C'' We're only keeping Person_evidence for the #Evidence hash associated with the whole ?RNAi object -- C+J
 +
*Penetrance Incomplete - rna_penincomplete - Toggle ''I think we talked about this, but in case we didn't phenotype OA uses a dropdown for the 4 penetrance types instead of separate toggle fields -- J''  OK, this sounds like a good idea. Let's just make it a dropdown list here as well -- C
 +
*Penetrance Low - rna_penlow - Toggle
 +
*Penetrance High - rna_penhigh - Toggle
 +
*Penetrance Complete - rna_pencomplete - Toggle
 +
*Method - rna_method - Method - Text
  
 +
Notes
  
'''Interaction Information 2'''
+
Fields for the tags "Predicted_gene", "Gene", "Transcript", "Pseudogene", "Homol_homol" and "Uniquely_mapped" will be omitted from the OA and populated in ACEDB after probe mapping to the genome at the EBI during the build process.
  
[[File:RNAi_Batch_Form_8.png]]
+
== *.ACE Dumper Documentation==
  
 +
Each RNAi object will need to get dumped with "RNAi" in the "Method" tag
  
The last 10 columns (AP-AY) capture genetic interaction information. If a relevant Interaction Object ID has already been generated for this RNAi experiment, it may be entered in the "Interaction object ID" field, obviating the need to enter any further interaction information. If not, the following fields must be populated. The "Interaction Type" field provides a drop-down menu to select from a list of standard interaction type names (e.g. "Suppression"). The "Interaction-relevant Phenotype(s)" field records the phenotype ID of the phenotype about which the interaction is based, usually the same phenotype as was given under "Phenotype ID" (Column X). "Interaction Remark" is a free-text field describing the details of the interaction, for example, which gene suppressed which other gene with respect to what phenotype and under what conditions.
+
On tazendra, perl module is /home/postgres/work/citace_upload/rnai/get_rnai_ace.pm
  
The "Gene" column reports the WormBase Gene ID (e.g. WBGene00000898) for the genes involved in the interaction. The yellow "Gene_name" column records the human-readable gene name of each gene involved in the interaction. The "Variation" records the allele/variation name of the gene involved in the genetic interaction, if applicable. This must be a recognized WormBase allele/variation name. If a transgene is involved in the interaction, the name of the transgene (official WormBase name) must be entered here. The "RNAi" column captures ("YES" or empty) whether or not the gene for that row was perturbed by RNAi in the interaction. "Direction" refers to the directionality of the interaction and the role that the gene (for that row) had in the interaction. This field provides a drop down list of flags: "Non_directional", "Effector", and "Effected". A suppressor/enhancer gene/mutation would be an "Effector" and the suppressed/enhanced gene/mutation would be the "Effected", whereas if the interaction is Non-directional (as in mutual enhancment, for example), each gene/mutation would be flagged as "Non_directional".
+
Run with script /home/postgres/work/citace_upload/rnai/use_package.pl
  
 +
Generates rnai.ace.<date> and err.out.<date> (to the day, so it will overwrite previous dumps from the same date)  ''please check dumper and .ace file dumped -- J'' The dumper seems to be working OK since the dumped *.ACE file you generated is reading in OK, except for the "Evidence" line, as I comment about above in the "Tab3" section. The line currently dumps out as "Person_evidence ..." and it needs to dump out as "Evidence  Person_evidence ...", otherwise the file throws errors when reading into ACEDB -- C ''Fixed, I thought only the innermost tag mattered, but I guess because there's no data in the RNAi part, just the tag, it needs to be there before the #Evidence part -- J'' I believe so, yes. -- C
  
===== Submitting the Batch Form to Generate an *.ACE File =====
+
Use_package.pl line comments:
  
 +
'my $outfile = 'rnai.ace.' . $date;'
 +
'my $errfile = 'err.out.' . $date;'
  
Once you have finished filling out the batch form and you are ready to submit it for processing into a *.ace file, you first need to convert it to a *.csv tab-delimited file. One option for doing this is to open the Excel document in Open Office, delete the yellow columns, and then "Save As..." and choose the "Text CSV (.csv)" option:
+
*These lines (above) specify the output; change as necessary.
  
[[File:Open_Office_Save_As_CSV.png]]
+
'my ($all_entry, $long_text, $err_text) = &getRnai('all');'
  
Once you click "Save", the following window may come up:
+
'# my ($all_entry, $long_text, $err_text) = &getRnai('WBRNAi00008227');'
  
[[File:Open_Office_Keep_Current_Format.png]]
+
*The lines above specify what to dump out of the OA. 'all' dumps everything. If you want to dump specific objects, comment out the top line (add a '#') and uncomment the bottom line (remove '#') and specify the object name where 'WBRNAi00008227' is. Because of permissions, the script will need to be copied, modified, and then run from a permissible directory.
  
Click on "Keep Current Format". You will then be brought to the Open Office Export dialog box:
+
'''Molecule now converts the pgid stored in postgres with the molecule name in mop_name for that pgid -- J'''
  
[[File:Open_Office_Export_Options.png]]
 
  
'''Non-Macintosh Export'''
 
  
If you are not working on a Mac, click on the "Field delimiter" field, select "{Tab}", and then click "OK":
+
=== Error Checks During Dump Process ===
  
[[File:Open_Office_Export_Default_1.png]]
+
The following is a list of checks that the .ACE dumper script will perform on all RNAi objects being dumped out of the OA to make sure that the data is consistent and doesn't have any nonsensical information. Any errors that are found will be printed to an "err" file. Every line of the file will list an error message of the general format:
  
 +
<pre>
 +
PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation
 +
</pre>
  
[[File:Open_Office_Export_Default_2.png]]
+
where "Dump_status" could be "nodump" (for objects that have a 'fatal' error and will not be dumped) or "flagonly" (for objects that have a 'non-fatal' error and will be dumped).
  
  
'''Macintosh Export'''
 
  
Alternatively, if you work on a Mac, select the "Character set" field in the dialog box and select "Western Europe (Apple Macintosh)". Next, select the "Field delimiter" field and select "{Tab}". Then click "OK"
+
'''Fatal Errors (RNAi objects will not get dumped)'''
  
[[File:Open_Office_Mac_Export_1.png]]
+
1) If there is no identifying RNAi probe information (i.e. the sequence or clone used to knockdown the target gene) in an RNAi object, the dumper script will generate an error message that is printed to the ERROR output file and the object will not get dumped. This is determined by checking that:
  
 +
a) There is at least one "PCR Product" entry OR
  
[[File:Open_Office_Mac_Export_2.png]]
+
b) There is at least one "DNA Text" entry OR
  
 +
c) There is at least one "Sequence" entry
  
[[File:Open_Office_Mac_Export_3.png]]
+
If none of these conditions hold true, then an error message will be printed in tab-delimited format like this:
  
 +
<pre>
 +
12345  nodump    WBPerson1234  There is no sequence, neither pcrproduct nor dnatext nor sequence
 +
</pre>
  
This will save your *.csv file in the current directory (where the *.xls file is) or wherever you have specified.
 
  
 +
2) If there is no reference (Paper or Person) then the object will not get dumped and an error message is printed:
  
Next, you will have to secure-copy your *.csv file to curation@elbrus.caltech.edu. Open a terminal, cd to your directory, and at the prompt type:
+
<pre>
 +
12345  nodump    WBPerson1234  There is no reference, neither paper nor person
 +
</pre>
  
$ scp *.csv curation@elbrus.caltech.edu:~/RNAi/YourDirectory
 
  
(OR just "$ scp *.csv curation@elbrus:~/RNAi/YourDirectory" if you are on Caltech campus)
+
3) If there is no "Phenotype" specified by the curator, the object will not get dumped and an error message will print to the ERROR output file like this:
  
Enter the password.
+
<pre>
 +
12345  nodump    WBPerson1234  There is no phenotype
 +
</pre>
  
Note: you will need to establish "YourDirectory" under the "RNAi" directory and place the batch form processing script into that directory.
 
  
Next, acquire secure-shell access to curation@elbrus:
+
4) If there is no RNAi ID, the object will not get dumped and an error message will print to the ERROR output file like this:
  
$ ssh curation@elbrus
+
<pre>
 +
12345  nodump    WBPerson1234  There is no RNAi ID
 +
</pre>
  
Enter the password.
 
  
Move to your directory:
 
  
$ cd RNAi/YourDirectory
+
'''Non-Fatal Errors (RNAi objects will get dumped, but error message will get printed)'''
  
Now run the Perl script with the following syntax:
+
1) If there is no dsRNA "Delivery Method" that has been specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:
  
$./rnai_batch_mapping.pl -i *.csv -t -o YourOutputName_Test.ace -e YourErrorFile_Test.txt
+
<pre>
 +
12345  flagonly    WBPerson1234  There is no deliverymethod
 +
</pre>
  
The "-i" option indicates your input *.csv file, the "-t" option indicates that you are running a test process of the *.csv file that will use temporary WB RNAi IDs (in case there are errors you don't want to assign new WBRNAi IDs), the "-o" option indicates the output *.ace file, and the "-e" indicates the error text file that will capture and report any errors that occur during processing.
 
  
The script will need to run for a few minutes (maybe several minutes depending on the size of you *.csv file). Once the script is done, check the error.txt file for any errors that may have occurred. If errors did occur, you will need to track down the source of the errors, fix them, and re-run the test processing.
+
2) If there is no "Species" specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:
  
The script generates two *.ace files: YourOutputName_Test.ace and RNAi_interaction.ace
+
<pre>
 +
12345  flagonly  WBPerson1234  There is no species
 +
</pre>
  
Once your processing is free of errors, you may wish to test both *.ace files by reading them into ACEDB and ensuring that they read into the database with no errors.
 
  
Once your processing is free of errors, you will run the "real" processing as follows:
+
3) If no curator is listed for the RNAi object, the RNAi object will get dumped, but an error message will print to the ERROR output file like this:
  
$./rnai_batch_mapping.pl -i *.csv -r -o YourOutputName.ace -e YourErrorFile.txt
+
<pre>
 
+
12345  flagonly  no curator   has no curator
Note that the only change is from using the "-t" option to the "-r" option. The "-r" option assigns real WB RNAi IDs to the RNAi experiments in the *.ace file.
+
</pre>
 
 
Once the real processing runs smoothly and there are no errors, test the two *.ace files by reading them into ACEDB and making sure they read in without any errors. If both files read in without error, both files are ready to be submitted for upload to the database. Secure-copy the *.ace files to your local directory and archive them as needed.
 
 
 
 
 
== RNAi Paper Flagging and Processing of SVM "Low" Results ==
 
 
 
We have been using SVM to flag papers that may contain RNAi data.  These papers are given a probability score (High, Medium or Low) based on the SVM training set to reflect the possibility that they contain RNAi data to be curated.  The High and Medium scoring papers are automatically added to the list of RNAi papers that are to be curated (on the Paper Editor) and the Lows are stored separately.  The Low scoring papers must be manually checked by a curator and if the paper has "curatable" RNAi data, the curator must manually add that paper to the list.  The Low scoring paper list is being checked approximately every 3 months (it is added to automatically every Monday at 2am based on SVM results). The following is the SOP for checking these papers and adding them to the list of RNAi papers to curate.
 
 
 
=== SVM Low's - SOP for checking papers ===
 
 
 
The file containing the papers with a low priority score (named "low") is stored on tazendra in the directory:
 
/home/postgres/work/pgpopulation/svm/gary_rnai
 
New papers are concatenated to this list.  The curator checking the recently added papers must go back in the list to the most recent paper that has a "commented out" mark "//".  This mark means that the list has been checked up to this point in the paper list....all papers after this mark need to be manually checked.  When the curator is finished, they should "comment out" the last paper on the list so the next curator knows where to begin checking next time.
 
 
 
=== SVM Low's - SOP for adding papers to curation pipeline ===
 
 
 
After checking the "Low" papers, the ones that do conatin data must be added to the curation list.  To do so go to the curator_first_pass.cgi on the web which is listed under "curation Forms" on the site map for tazendra.  Add the appropriate number of the WBPaper id, for example just "00038308" without the WBPaper prefix, and click on the query button. Under "rnai" in the Gene Function section add the following comment in the curator box "SVM -LOW". When you hit "Flag" at the bottom of the page this paper will be added to the RNAi curation list and noted as an SVM with a low probability score.
 

Latest revision as of 20:29, 30 July 2019

Archived RNAi documentation may be found here

RNAi Curation Mission Summary

The term "RNAi" stands for "RNA-interference" and refers to the targeted silencing of gene expression of a "target gene" via introduction of double stranded RNA (dsRNA) containing high degrees of sequence identity to the "target gene". In C. elegans, molecules of dsRNA may be introduced into the worm by a variety of different methods including direct micro-injection, soaking of worms in solution containing dsRNA, feeding worms bacteria that express dsRNA from a plasmid, and transgenic expression (within the worm) of dsRNA. For efficient and specific knockdown of gene expression, the dsRNA must have a minimum sequence identity with the target gene sequence. Any phenotype(s) resulting from the RNAi-mediated knockdown of a particular gene is thought to directly reflect the phenotype of a loss-of-function mutation in that gene, hence providing evidence as to the gene's biological function.

The goal of RNAi curation is to associate RNAi-mediated phenotypes with the target genes knocked down in RNAi experiments, as found in the literature pertaining to C. elegans and related species. There are some important things to consider when curating RNAi experiments. Some RNAi experiments are simple to curate as the strain is N2 (wild type genotype), only a single gene is targeted for RNAi, the phenotype is clearly stated, and the source and identity of the dsRNA is clearly stated by the authors. In many cases, however, the curation task is not so simple: there may be complex genotypes with multiple mutations in the strain receiving the dsRNA, multiple RNAi gene targets, complex genetic interactions, and/or missing descriptions from the authors as to controls, dsRNA delivery method, dsRNA identity, and/or phenotype directly resulting from the RNAi in question. Hopefully these issues will all be addressed below.

RNAi Data Model

This is the ?RNAi data model as of WormBase Release WS251:

//////////////////////////////////////////////////////////////////
//
// ?RNAi class
//
//////////////////////////////////////////////////////////////////

?RNAi   Evidence #Evidence
        History_name UNIQUE ?Text
        Homol Homol_homol ?Homol_data XREF RNAi_homol ?Method Float Int UNIQUE Int Int UNIQUE Int #Homol_info
        Sequence_info   DNA_text Text UNIQUE Text //stores actual probe sequence for automated mapping
						  // 1st Text is DNA, 2nd is probe name
                       Sequence ?Sequence XREF RNAi  //links to a real Sequence object used in the experiment 
                                                     // such as yk clone; not UNIQUE anymore
                       Clone ?Clone XREF Used_in_RNAi       // Chris WS244
                       PCR_product ?PCR_product XREF RNAi // links to a PCR_product object used in 
                                                          // the experiment; not UNIQUE anymore
       Uniquely_mapped  //boolean; if present, signifies that ?RNAi object has a unique sequence 
                        // which maps to a single place in the genome
       Experiment      Laboratory ?Laboratory
                       Date UNIQUE DateType
                       Strain UNIQUE ?Strain
                       Genotype UNIQUE ?Text   //used when no Strain object exists
                       Treatment UNIQUE ?Text
                       Life_stage UNIQUE ?Life_stage
                       Temperature UNIQUE Int
                       Delivered_by UNIQUE Bacterial_feeding      //RL [010327]
                                           Injection              //RL [010327]
                                           Soaking                //RL [010327]
                                           Transgene_expression   //RL [010327]
       Inhibits        Predicted_gene ?CDS XREF RNAi_result #Evidence // "gene" parent (unreliable)
                       Gene ?Gene   XREF RNAi_result #Evidence           //RL [010327]
                       Transcript ?Transcript XREF RNAi_result #Evidence // [021126 krb]
                       Pseudogene ?Pseudogene XREF RNAi_result #Evidence // [030801 krb]
       Supporting_data Movie ?Movie XREF RNAi    // Lincoln, krb [010807]
       DB_info         Database ?Database ?Database_field ?Accession_number 
       Species         UNIQUE ?Species
       Interaction     ?Interaction
       Reference       UNIQUE ?Paper XREF RNAi //[070215 ar2] made reference unique so Paper sort of 
                                               // equates to a Study class for Will S
       Phenotype       ?Phenotype XREF RNAi #Phenotype_info
       Phenotype_not_observed ?Phenotype XREF Not_in_RNAi #Phenotype_info 
       Expr_profile    ?Expr_profile XREF RNAi_result // connection added during build [030106 krb]
       Remark          ?Text #Evidence
       Method UNIQUE   ?Method


Here is the WS251 ?Phenotype_info model:

////////////////////////////////////////////
//
// ?Phenotype_info Class
//
////////////////////////////////////////////

#Phenotype_info Paper_evidence ?Paper
                Person_evidence ?Person
                Curator_confirmed ?Person
                Remark ?Text #Evidence // specific remarks about the phenotype
                Quantity_description ?Text #Evidence //Remark to describe what quantity describes, below
                Quantity UNIQUE Int UNIQUE Int #Evidence
                Not #Evidence //This is being phased out but is needed for the next phase [06/08/10].
                Penetrance Incomplete Text #Evidence
                           Low Text #Evidence
                           High Text #Evidence
                           Complete Text #Evidence
                           Range UNIQUE Int UNIQUE Int #Evidence // Range of penetrance
                Recessive #Evidence
                Semi_dominant #Evidence
                Dominant #Evidence
                Haplo_insufficient #Evidence
                Caused_by_gene ?Gene #Evidence
                Caused_by_other ?Text #Evidence
                Rescued_by_transgene ?Transgene
                Variation_effect Gain_of_function_undetermined_type #Evidence
                                 Antimorph_gain_of_function #Evidence
                                 Dominant_negative_gain_of_function #Evidence
                                 Hypermorph_gain_of_function #Evidence
                                 Neomorph_gain_of_function #Evidence
                                 Loss_of_function_undetermined_extent #Evidence    
                                 Null #Evidence
                                 Predicted_null_via_sequence #Evidence
                                 Probable_null_via_phenotype #Evidence
                                 Hypomorph_reduction_of_function #Evidence
                                 Predicted_hypomorph_via_sequence #Evidence
                                 Probable_hypomorph_via_phenotype #Evidence
                                 Wild_allele #Evidence
                Affected_by Molecule ?Molecule #Evidence // ?Molecule model Karen Yook
                EQ_annotations Anatomy_term ?Anatomy_term ?PATO_term #Evidence
                               Life_stage ?Life_stage ?PATO_term #Evidence
                               GO_term ?GO_term ?PATO_term #Evidence
                               Molecule_affected  ?Molecule ?PATO_term #Evidence
                Temperature_sensitive Heat_sensitive Text #Evidence
                                      Cold_sensitive Text #Evidence
                Maternal UNIQUE Strictly_maternal #Evidence
                                With_maternal_effect #Evidence
                Paternal #Evidence
                Phenotype_assay Strain ?Strain #Evidence
                                Treatment ?Text #Evidence
                                Temperature ?Text #Evidence
                                Genotype ?Text #Evidence
                Ease_of_scoring UNIQUE ES0_Impossible_to_score #Evidence
                                       ES1_Very_hard_to_score #Evidence
                                       ES2_Difficult_to_score #Evidence
                                       ES3_Easy_to_score #Evidence

Here is the WS251 #Evidence hash model:

////////////////////////////////////////////////////////////////////////////////
//			      Evidence hash
////////////////////////////////////////////////////////////////////////////////

#Evidence Paper_evidence ?Paper                            // Data from a Paper
          Published_as ?Text                               //  .. track other names for the same data
          Person_evidence ?Person                          // Data from a Person
          Author_evidence ?Author UNIQUE Text              // Data from an Author
          Accession_evidence ?Database ?Accession_number   // Data from a database (NDB/UNIPROT etc)
          Protein_id_evidence ?Text                        // Reference a protein_ID
          GO_term_evidence ?GO_term                        // Reference a GO_term
          Expr_pattern_evidence ?Expr_pattern              // Reference a Expression pattern  
          Microarray_results_evidence ?Microarray_results  // Reference a Microarray result
          RNAi_evidence ?RNAi                              // Reference a RNAi knockdown
          CGC_data_submission                              // bless the data as comning from CGC
	  Curator_confirmed ?Person                        // bless the data manually 
	  Inferred_automatically Text                      // bless the data via a script
	  Date_last_updated UNIQUE DateType                // Stores last update timestamp
	  Feature_evidence ?Feature			   // Reference a Feature - eg for creation of isoform based on TEC-RED SL2
	  Laboratory_evidence ?Laboratory                  // Reference a Lab
	  From_analysis ?Analysis			   // Reference an analysis
	  Variation_evidence ?Variation			   // Explicitly record variation from which IMP manual GO annotations are made
	  Mass_spec_evidence ?Mass_spec_peptide
	  Sequence_evidence ?Sequence		           // for sequence data that hasn't been submitted to a public resource
	  Remark ?Text


RNAi annotations in CitaceMinus

These papers have >2000 RNAi objects and, therefore, did not get parsed into postgres tables for RNAi (a total of 59,656 RNAi objects stored in CitaceMinus) :

  • WBPaper00004402 (2287 RNAi objects)
  • WBPaper00004403 (2584 RNAi objects)
  • WBPaper00004651 (2479 RNAi objects)
  • WBPaper00005654 (14253 RNAi objects)
  • WBPaper00006395 (3230 RNAi objects)
  • WBPaper00024497 (10951 RNAi objects)
  • WBPaper00025054 (20709 RNAi objects)
  • WBPaper00029258 (3163 RNAi objects)


RNAi Curation Standard Operating Procedure (SOP)

In order to ensure consistency of RNAi curation across curators and WormBase releases, a set of standard procedures for RNAi curation are outlined below. The descriptions of these procedures include use of the two main methods for generating *.ace files for release submission: (1) the web-based CGI form for one-at-a-time RNAi object generation and (2) the batch form submission method.

Minimum Requirements for an RNAi Object

Regardless of which curation method you (the curator) choose, there are a set of minimum requirements in order to generate a complete RNAi experiment object:

1) A reference (e.g. WBPaperID)

2) A curator name

3) The sequence of the dsRNA used in the experiment to knockdown gene expression

4) A dsRNA delivery method (e.g. Injection) [Note: some authors omit this information. This can be curated arbitrarily and removed from the *.ace file once it has been generated]

5) A phenotype (observed or not_observed)

If any one of these five basic pieces of information are missing, the scripts that generate the *.ace file will fail, returning an error.


RNAi Paper Flagging and Processing of SVM "Low" Results and Tracking

We have been using SVM to flag papers that may contain RNAi data. These papers are given a probability score (High, Medium or Low) based on the SVM training set to reflect the possibility that they contain RNAi data to be curated. The High and Medium scoring papers are automatically added to the list of RNAi papers that are to be curated (on the Paper Editor) and the Lows are stored separately. The Low scoring papers must be manually checked by a curator and if the paper has "curatable" RNAi data, the curator must manually add that paper to the list. The Low scoring paper list is being checked approximately every 3 months (it is added to automatically every Monday at 2am based on SVM results). The following is the SOP for checking these papers and adding them to the list of RNAi papers to curate.

SVM Low's - SOP for checking papers

The file containing the papers with a low priority score (named "low") is stored on tazendra in the directory: /home/postgres/work/pgpopulation/svm/gary_rnai New papers are concatenated to this list. The curator checking the recently added papers must go back in the list to the most recent paper that has a "commented out" mark "//". This mark means that the list has been checked up to this point in the paper list....all papers after this mark need to be manually checked. When the curator is finished, they should "comment out" the last paper on the list so the next curator knows where to begin checking next time.

SVM Low's - SOP for adding papers to curation pipeline

After checking the "Low" papers, the ones that do conatin data must be added to the curation list. To do so go to the curator_first_pass.cgi on the web which is listed under "curation Forms" on the site map for tazendra. Add the appropriate number of the WBPaper id, for example just "00038308" without the WBPaper prefix, and click on the query button. Under "rnai" in the Gene Function section add the following comment in the curator box "SVM -LOW". When you hit "Flag" at the bottom of the page this paper will be added to the RNAi curation list and noted as an SVM with a low probability score.

SVM Low's - SOP for adding papers SVM Tracking

For tracking of SVM results (if False Positive, False Neg etc) a tool was created: http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi This web-based tool will automatically track papers that are positive based on our RNAi curation pipeline/checkout form and the OA; That is if a paper is on the checkout form and is curated, the data will be in the OA and marked as a True Positive. If the paper on the checkout list is a False Positive, the curator must note this on the paper editor (Flag False Positive) and the results will be forwarded to the SVM tracking form. http://tazendra.caltech.edu/~postgres/cgi-bin/paper_editor.cgi The SVM lows that do contain RNAi data are added to the checkout form by a curator (see above) and will be dealt with as any other paper on the list. The SVM lows that actually do not contain RNAi data must be entered manually onto the SVM tracking form as False Positives. To do this go to http://XXXXXXXXX.caltech.edu/~postgres/cgi-bin/svm_results.cgi Under "Enter Curator Results" Select data type/ select curator negative / add the list of negatives to / select comment "SVM positive - Curator negative"


RNAi OA Postgres Tables (rna_*)

CG updated 9-1-2015

postgres tables
rna_anatomy rna_goprocess rna_person_hst
rna_anatomyquality rna_goprocessquality rna_phenotype
rna_child_of rna_heatsens rna_phenotype_hst
rna_child_of_hst rna_heatsens_hst rna_phenotypenot
rna_coldsens rna_historyname rna_phenotypenot_hst
rna_coldsens_hst rna_historyname_hst rna_phenremark
rna_curator rna_laboratory rna_phenremark_hst
rna_curator_hst rna_laboratory_hst rna_quantdesc
rna_database rna_lifestage rna_quantdesc_hst
rna_database_hst rna_lifestage_hst rna_quantfromto
rna_date rna_lifestagequality rna_quantfromto_hst
rna_date_hst rna_molaffected rna_remark
rna_deliverymethod rna_molaffectedquality rna_remark_hst
rna_deliverymethod_hst rna_molecule rna_sequence
rna_dnatext rna_molecule_hst rna_sequence_hst
rna_dnatext_hst rna_movie rna_species
rna_exprprofile rna_movie_hst rna_species_hst
rna_exprprofile_hst rna_name rna_strain
rna_flaggenereg rna_name_hst rna_strain_hst
rna_flaggenereg_hst rna_nodump rna_suggested
rna_flaggeneticintxn rna_nodump_hst rna_suggested_definition
rna_flaggeneticintxn_hst rna_paper rna_suggested_definition_hst
rna_fromgenereg rna_paper_hst rna_suggested_hst
rna_fromgenereg_hst rna_pcrproduct rna_temperature
rna_genotype rna_pcrproduct_hst rna_temperature_hst
rna_genotype_hst rna_penetrance rna_treatment
rna_gocomponent rna_penetrance_hst rna_treatment_hst
rna_gocomponentquality rna_penfromto
rna_gofunction rna_penfromto_hst
rna_gofunctionquality rna_person

RNAi OA

CG updated 9-1-2015

NOTE: Warning: RNAi objects may have multiple lines in the OA due to the nested phenotype data (Penetrance info, etc.)

Dependency Notice: The Community Phenotype Form and the RNAi OA each have separate code to look up the most recent RNAi ID and generate new RNAi IDs based on that. If the code is changed for one, we should also check that the code is changed for the other.


TAB 1

RNAi OA TAB 1 9-1-2015.png

  • pgid - the postgres ID - NOT DUMPED
  • Name - rna_name - RNAi : - WBRNAiID #Note: This is automatically assigned
  • Paper - rna_paper - Reference - ?Paper (Ontology)
  • Curator - rna_curator - NOT DUMPED - Curator (Dropdown)
  • PCR Product - rna_pcrproduct - PCR_product - ?PCR_product (Multi-ontology) #Note: EBI/Hinxton will map these
  • DNA Text - rna_dnatext - DNA_text - Big text #Note: multiple Sequences should be separated by pipe "|" (caution: pressing <ENTER> may cause problems)
  • Strain - rna_strain - Strain - ?Strain (Ontology)
  • Genotype - rna_genotype - Genotype - Big text
  • Treatment - rna_treatment - Treatment - Big text
  • Temperature - rna_temperature - Temperature - Text (Integer)
  • Delivery Method - rna_deliverymethod - Delivered_by - (Multi-dropdown) four choices: "Bacterial_feeding", "Injection", "Soaking", "Transgene_expression"
  • Species - rna_species - Species - ?Species (Dropdown)
  • Remark - rna_remark - Remark - Big text
  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)


TAB 2

RNAi OA TAB 2 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • Phenotype Suggestion - rna_suggested - not dumped - field to suggest new phenotype term. Will replace whatever term(s) are currently in the Phenotype field once it is approved
  • Suggested Definition - rna_suggested_definition - NOT DUMPED - Big text field with definition of new suggested phenotype term
  • Child Of - rna_child_of - NOT DUMPED - list of parent phenotype term(s) for suggested phenotype
  • Affected By Molecule - rna_molecule - (in-line with Phenotype) Molecule - ?Molecule (Multi-ontology)
  • Penetrance From To - rna_penfromto - (in-line with Phenotype) Range - Text (Integer-space-Integer)
  • Penetrance - rna_penetrance - (in-line with Phenotype) Penetrance - dropdown like in phenotype OA
  • Quantity From To - rna_quantfromto - (in-line with Phenotype) Quantity - Text (Integer-space-Integer)
  • Quantity Description - rna_quantdesc - (in-line with Phenotype) Quantity_description - Big text
  • Heat Sensitive - rna_heatsens - (in-line with Phenotype) Temperature_sensitive - Toggle
  • Cold Sensitive - rna_coldsens - (in-line with Phenotype) Temperature_sensitive - Toggle


TAB 3

RNAi OA TAB 3 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • Anatomy - rna_anatomy (Multi-ontology)
  • Anatomy Quality - rna_anatomyquality (Multi-ontology)
  • Life Stage - rna_lifestage - Life_stage - ?Life_stage (Multi-ontology)
  • Life Stage Quality - rna_lifestagequality (Multi-ontology)
  • Molecule Affected - rna_molaffected (Multi-ontology)
  • Mol Aff Quality - rna_molaffectedquality (Multi-ontology)
  • GO Process - rna_goprocess (Multi-ontology)
  • GO P Quality - rna_goprocessquality (Multi-ontology)
  • GO Function - rna_gofunction (Multi-ontology)
  • GO F Quality - rna_gofunctionquality (Multi-ontology)
  • GO Component - rna_gocomponent (Multi-ontology)
  • GO C Quality - rna_gocomponentquality (Multi-ontology)

TAB 4

RNAi OA TAB 4 9-1-2015.png

  • Phenotype Observed - rna_phenotype - Phenotype - ?Phenotype ID & name (Multi-ontology)
  • NOT - rna_phenotypenot - Phenotype_not_observed - NOT Toggle
  • Phenotype Remark - rna_phenremark - (in-line with Phenotype) Remark - (Big Text)
  • NO DUMP - rna_nodump - NOT DUMPED - toggle - Prevents data from that row from being dumped. This will also indicate to the Curation Status Form (CSF) that the paper has not been curated and will come up as "oa_blank" in the CSF
  • From Genereg - rna_fromgenereg - NOT DUMPED - Toggle
  • Flag Gene Reg - rna_flaggenereg - NOT DUMPED - Toggle
  • Flag Genetic Intxn - rna_flaggeneticintxn - NOT DUMPED - Toggle
  • Person Evidence - rna_person - Evidence Person_evidence - multiontology - for ?RNAi object's #Evidence
  • History Name - rna_historyname - History_name - Text #Note: this field is to accommodate older RNAi objects
  • Movie - rna_movie - Movie - bigtext #Note: this field is to accommodate older RNAi objects; Note: Separate multiple movie entries with bars (|)
  • Database - rna_database - Database - Text #Note: this field is to accommodate older RNAi objects; Enter data in this format, split multiple database lines with pipes if there are any :
    • Phenobank2 Gene&RNAID GeneID=507328 | Phenobank3 Gene&RNAID GeneID=123456
  • Expression Profile - rna_exprprofile - Expr_profile - Text #Note: this field is to accommodate older RNAi objects


NOTES ON SUGGESTING NEW PHENOTYPE TERMS THROUGH THE RNAI OA

  • When suggesting a new phenotype term in TAB 2 of the RNAi OA, curators should make sure to create a PGID/Row with a single placeholder phenotype that is intended to be replaced by the new phenotype term once the new term has been approved
  • If the data is dumped from the OA for upload before the new term has been approved, the object will dump with the placeholder phenotype term for that upload
  • This behavior is similar to that of the Phenotype OA
  • The data entered into the three fields "Phenotype Suggestion", "Suggested Definition", and/or "Child Of" will immediately be populated into the corresponding fields in the New Objects CGI
  • Cron job:
0 3 * * sun /home/acedb/gary/phn_suggested/phn_suggested_oa.pl

Runs every Sunday at 3 am, checking for entries in rna_suggested (and app_suggested) within the last 7 days and sends Gary S. an e-mail if there are any new entries. If there aren't any, there is no email.

NOT USING

  • Evidence - postgres table and field not created, don't know what to parse into this field - Text #Note: this field is to accommodate older RNAi objects So there are only 8 RNAi objects that use the Evidence field, and they do so with 'Person_evidence'. In the OA, I think we should include: Person_evidence, Author_evidence, Curator_confirmed, Laboratory_evidence -- C We're only keeping Person_evidence for the #Evidence hash associated with the whole ?RNAi object -- C+J
  • Penetrance Incomplete - rna_penincomplete - Toggle I think we talked about this, but in case we didn't phenotype OA uses a dropdown for the 4 penetrance types instead of separate toggle fields -- J OK, this sounds like a good idea. Let's just make it a dropdown list here as well -- C
  • Penetrance Low - rna_penlow - Toggle
  • Penetrance High - rna_penhigh - Toggle
  • Penetrance Complete - rna_pencomplete - Toggle
  • Method - rna_method - Method - Text

Notes

Fields for the tags "Predicted_gene", "Gene", "Transcript", "Pseudogene", "Homol_homol" and "Uniquely_mapped" will be omitted from the OA and populated in ACEDB after probe mapping to the genome at the EBI during the build process.

*.ACE Dumper Documentation

Each RNAi object will need to get dumped with "RNAi" in the "Method" tag

On tazendra, perl module is /home/postgres/work/citace_upload/rnai/get_rnai_ace.pm

Run with script /home/postgres/work/citace_upload/rnai/use_package.pl

Generates rnai.ace.<date> and err.out.<date> (to the day, so it will overwrite previous dumps from the same date) please check dumper and .ace file dumped -- J The dumper seems to be working OK since the dumped *.ACE file you generated is reading in OK, except for the "Evidence" line, as I comment about above in the "Tab3" section. The line currently dumps out as "Person_evidence ..." and it needs to dump out as "Evidence Person_evidence ...", otherwise the file throws errors when reading into ACEDB -- C Fixed, I thought only the innermost tag mattered, but I guess because there's no data in the RNAi part, just the tag, it needs to be there before the #Evidence part -- J I believe so, yes. -- C

Use_package.pl line comments:

'my $outfile = 'rnai.ace.' . $date;' 'my $errfile = 'err.out.' . $date;'

  • These lines (above) specify the output; change as necessary.

'my ($all_entry, $long_text, $err_text) = &getRnai('all');'

'# my ($all_entry, $long_text, $err_text) = &getRnai('WBRNAi00008227');'

  • The lines above specify what to dump out of the OA. 'all' dumps everything. If you want to dump specific objects, comment out the top line (add a '#') and uncomment the bottom line (remove '#') and specify the object name where 'WBRNAi00008227' is. Because of permissions, the script will need to be copied, modified, and then run from a permissible directory.

Molecule now converts the pgid stored in postgres with the molecule name in mop_name for that pgid -- J


Error Checks During Dump Process

The following is a list of checks that the .ACE dumper script will perform on all RNAi objects being dumped out of the OA to make sure that the data is consistent and doesn't have any nonsensical information. Any errors that are found will be printed to an "err" file. Every line of the file will list an error message of the general format:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

where "Dump_status" could be "nodump" (for objects that have a 'fatal' error and will not be dumped) or "flagonly" (for objects that have a 'non-fatal' error and will be dumped).


Fatal Errors (RNAi objects will not get dumped)

1) If there is no identifying RNAi probe information (i.e. the sequence or clone used to knockdown the target gene) in an RNAi object, the dumper script will generate an error message that is printed to the ERROR output file and the object will not get dumped. This is determined by checking that:

a) There is at least one "PCR Product" entry OR

b) There is at least one "DNA Text" entry OR

c) There is at least one "Sequence" entry

If none of these conditions hold true, then an error message will be printed in tab-delimited format like this:

12345   nodump    WBPerson1234   There is no sequence, neither pcrproduct nor dnatext nor sequence


2) If there is no reference (Paper or Person) then the object will not get dumped and an error message is printed:

12345   nodump    WBPerson1234   There is no reference, neither paper nor person


3) If there is no "Phenotype" specified by the curator, the object will not get dumped and an error message will print to the ERROR output file like this:

12345   nodump    WBPerson1234   There is no phenotype


4) If there is no RNAi ID, the object will not get dumped and an error message will print to the ERROR output file like this:

12345   nodump    WBPerson1234   There is no RNAi ID


Non-Fatal Errors (RNAi objects will get dumped, but error message will get printed)

1) If there is no dsRNA "Delivery Method" that has been specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:

12345   flagonly    WBPerson1234   There is no deliverymethod


2) If there is no "Species" specified by the curator, the RNAi object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:

12345   flagonly   WBPerson1234   There is no species


3) If no curator is listed for the RNAi object, the RNAi object will get dumped, but an error message will print to the ERROR output file like this:

12345   flagonly   no curator   has no curator