Gene Interaction
'links to relevant pages
Caltech documentation
Archived Interaction documentation may be found here
Contents
- 1 Interaction Curation
- 2 Pipeline
- 3 Interaction Models
- 4 Gene_gene Interaction OA
- 5 Some Notes for gene_gene_interaction
- 6 The new Interaction OA, March 2012
- 7 To go live on tazendra
- 8 Instructions on How to Use the New OA
- 9 Nightly Cron Job to Assign New Interaction IDs
- 10 Interaction OA .ACE Dumper
Interaction Curation
Pipeline
dump .ace file from OA for upload
- on tazendra: /home/acedb/xiaodong/oa_interactions_dumper
- run script by calling: ./use_package.pl
- error file will be spitted out in the same directory after every run. inform curators to check the errors.
- Old dumper outputs (err* and interaction* files) will now be archived in the following directory:
- /home/acedb/xiaodong/oa_interactions_dumper/Interaction_OA_Dumper_Output_Archive -- CG 10-31-2012
Interaction Models
Current Models
The current ?Interaction model now consolidates ?Gene_regulation objects, ?YH objects, and ?Interaction objects into a single class, ?Interaction. The proposed ?Physical_interaction model has also been consolidated into this larger model. Note that #Interaction_info and #Interaction_type have been deprecated.
Updated for WS249 - CG 3-18-2015
?Interaction Interaction_type Physical Predicted Regulatory Change_of_localization Change_of_expression_level Genetic Genetic_interaction Neutral_genetic Synthetic Enhancement Unilateral_enhancement Mutual_enhancement Positive_genetic Suppression Complete_suppression Partial_suppression Unilateral_suppression Complete_unilateral_suppression Partial_unilateral_suppression Mutual_suppression Complete_mutual_suppression Partial_mutual_suppression Suppression_enhancement Asynthetic Negative_genetic Oversuppression Unilateral_oversuppression Mutual_oversuppression Oversuppression_enhancement Phenotype_bias No_interaction // Negative data; no interaction was observed after testing Epistasis Positive_epistasis Maximal_epistasis Minimal_epistasis Neutral_epistasis Qualitative_epistasis Opposing_epistasis Quantitative_epistasis Interactor PCR_interactor ?PCR_product XREF Interaction #Interactor_info // PCR_product of the interacting gene or protein, e.g. Yeast Two Hybrid experiments Sequence_interactor ?Sequence XREF Interaction #Interactor_info // Sequence of the interacting gene or protein Feature_interactor ?Feature XREF Associated_with_Interaction #Interactor_info Interactor_overlapping_CDS ?CDS XREF Interaction #Interactor_info // CDS of the interacting gene or protein (or related sequence) Interactor_overlapping_gene ?Gene XREF Interaction #Interactor_info // Gene (or portion of gene) involved in the interaction Interactor_overlapping_protein ?Protein XREF Interaction #Interactor_info // Protein (or portion of protein) involved in the interaction Molecule_interactor ?Molecule XREF Interaction #Interactor_info // Molecule that interacts with a gene or protein (ported from Gene_regulation Class) Other_regulator ?Text #Interactor_info // Free text describing a regulator entity or condition that does not fall into a standard WormBase category Other_regulated ?Text #Interactor_info // Free text describing a regulated entity or condition that does not fall into a standard WormBase category Rearrangement ?Rearrangement XREF Interactor #Interactor_info Variation_interactor ?Variation XREF Interactor #Interactor_info // Added WS248; allele involved in genetic interaction Interaction_summary ?Text #Evidence Detection_method Affinity_capture_luminescence // A physical interaction detection technique Affinity_capture_MS // A physical interaction detection technique Affinity_capture_RNA // A physical interaction detection technique Affinity_capture_Western // A physical interaction detection technique Chromatin_immunoprecipitation // A physical interaction detection technique Cofractionation // A physical interaction detection technique Colocalization // A physical interaction detection technique Copurification // A physical interaction detection technique DNase_I_footprinting // A physical interaction detection technique Fluorescence_resonance_energy_transfer // A physical interaction detection technique Protein_fragment_complementation_assay // A physical interaction detection technique Yeast_two_hybrid // A physical interaction detection technique (Protein-protein) Biochemical_activity // A physical interaction detection technique Cocrystal_structure // A physical interaction detection technique Far_western // A physical interaction detection technique Protein_peptide // A physical interaction detection technique Protein_RNA // A physical interaction detection technique Reconstituted_complex // A physical interaction detection technique Electrophoretic_mobility_shift_assay // A physical interaction detection technique Yeast_one_hybrid // A physical interaction detection technique (Protein-DNA) Directed_yeast_one_hybrid // A physical interaction detection technique (Protein-DNA) Antibody // A regulatory interaction detection technique; Antibody name and details captured in Interactor_info hash Reporter_gene ?Text // A regulatory interaction detection technique Transgene // A regulatory interaction detection technique; Trasnsgene name and details captured in Interactor_info hash In_situ Text // A regulatory interaction detection technique Northern Text // A regulatory interaction detection technique Western Text // A regulatory interaction detection technique RT_PCR Text // A regulatory interaction detection technique Other_method ?Text // A regulatory interaction detection technique Library_info Library_screened Text INT // In the context of Y2H or YH screens, for example, the library may have been cDNA library or a pool of clones Origin From_laboratory UNIQUE ?Laboratory // A library generated at an academic laboratory From_company UNIQUE ?Text // A library generated at a company Regulation_level Transcriptional // Regulation occurs at the transcriptional level Post_transcriptional // Regulation occurs at the post-transcriptional level Post_translational // Regulation occurs at the post-translational level Regulation_result Positive_regulate #GR_condition Negative_regulate #GR_condition Does_not_regulate #GR_condition // added to capture negative data [040220 krb] Confidence Description Text // Free text description of the confidence, e.g. "Core" vs "Noncore" (Vidal Interactome terms) P_value UNIQUE Float // P-value confidence of interaction, if given Log_likelihood_score UNIQUE Float // Only used for Predicted interactions Throughput UNIQUE High_throughput //See BioGRID curation criteria for discussion Low_throughput Interaction_RNAi ?RNAi XREF Interaction // RNAi experiment associated with the interaction Interaction_phenotype ?Phenotype XREF Interaction // Phenotype associated with a genetic interaction Unaffiliated_variation ?Variation Unaffiliated_transgene ?Transgene Unaffiliated_antibody ?Antibody Unaffiliated_expr_pattern ?Expr_pattern WBProcess ?WBProcess XREF Interaction // WormBase biological process associated with the interaction DB_info Database ?Database ?Database_field ?Text // Any database reference to the interaction outside of WormBase, e.g. BioGRID, Interactome Paper ?Paper XREF Interaction Antibody_remark ?Text Historical_gene ?Gene Text Remark ?Text #Evidence
#Interactor_info Interactor_type Non_directional // An interactor that has no inherent directionality Bait // The interactor of interest or focus; the focus/starting point of an interaction screen Target // The discovered interactor; interactors found as a result of an interaction screen Effector // In a genetic interaction, the perturbation that affects the phenotype of the other perturbation Affected // In a genetic interaction, the perturbation whose phenotype is affected by the other perturbation Trans_regulator // A trans-acting regulator, e.g. a transcription factor Cis_regulator // A cis-acting regulator, e.g. an enhancer element Trans_regulated // A gene regulated in trans, e.g. by a transcription factor Cis_regulated // A gene regulated in cis, e.g. by an enhancer element Expr_pattern ?Expr_pattern // An expression pattern altered to indicate Transgene ?Transgene XREF Interactor // Transgene XREF Interactor that carries an interacting gene //removed XREF from proposal Construct ?Construct XREF Interactor Antibody ?Antibody XREF Interactor // Free text description of the antibody used to detect a regulation event //removed XREF from proposal Inferred_automatically Text // a script has updated this connection
In the adoption of this new ?Interaction model in WS231, we have consolidated the ?YH and ?Gene_regulation class into the ?Interaction class. As of WS231, there were 3831 ?Gene_regulation objects, which were then converted to ?Interaction objects with IDs WBInteraction000501384 through WBInteraction000505214. As of WS231, there were 11,993 ?YH objects, which were then converted to ?Interaction objects with IDs WBInteraction000505215 through WBInteraction000517207.
Gene_gene Interaction OA
Postgres Table Names
'Interaction ID' = int_name
'Curator' = int_curator
'Process' = int_process
'Database Field Accession Number' = int_database
'Paper' = int_paper
'Interaction Type' = int_type
'Interaction Summary' = int_summary
'Remark' = int_remark
'Physical interaction detection method' = int_detectionmethod
'Library screened and times found' = int_library
'From Laboratory' = int_laboratory
'From Company' = int_company
'PCR Bait' = int_pcrbait
'PCR Target(s)' = int_pcrtarget
'Non-directional PCR(s) = int_pcrnondir
'Sequence Bait' = int_sequencebait
'Sequence Target(s)' = int_sequencetarget
'Non-directional Sequence(s)' = int_sequencenondir
'Feature Bait' = int_featurebait
'Feature Target' = int_featuretarget
'Bait overlapping CDS' = int_cdsbait
'Target overlapping CDS(s)' = int_cdstarget
'Non-directional overlapping CDS(s)' = int_cdsnondir
'Bait overlapping protein' = int_proteinbait
'Target overlapping protein' = int_proteintarget
'Non-directional overlapping protein' = int_proteinnondir
'Bait overlapping gene' = int_genebait
'Target overlapping gene' = int_genetarget
'Antibody' = int_antibody
'Antibody remark' = int_antibodyremark
'Non-directional Gene(s)' = int_genenondir
'Effector Gene(s)' = int_geneone
'Affected Gene(s)' = int_genetwo
'Non-directional Rearrangement(s)' = int_rearrnondir
'Effector Rearrangement(s)' = int_rearrone
'Affected Rearrangement(s)' = int_rearrtwo
'Effector Other Type' = int_otheronetype
'Effector Other' = int_otherone
'Affected Other Type' = int_othertwotype
'Affected Other' = int_othertwo
'Large scale RNAi' = int_lsrnai
'RNAi' = int_rnai
'Interaction Phenotype(s)' = int_phenotype
'Expression pattern(s)' = int_exprpattern
'Non-directional Variation(s)' = int_variationnondir
'Effector Variation(s)' = int_variationone
'Affected Variation(s)' = int_variationtwo
'Non-directional Molecule(s)' = int_moleculenondir
'Effector Molecule(s)' = int_moleculeone
'Affected Molecule(s)' = int_moleculetwo
'Transgene(s)' = int_transgene
'Person' = int_person
'Confidence description' = int_confidence
'P-value' = int_pvalue
'Log-likelihood score' = int_loglikelihood
'High_hroughput' = int_throughput
'Sentence ID' = int_sentid
'False Positive' = int_falsepositive
Some Notes for gene_gene_interaction
Two Large Scale Interaction Data Sets
- Files and scripts for these large scale datasets have been moved into a new directory:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions
-- CG 10-31-2012
- WBPaper00027155 (Weiwei's science paper) has 23,128 objects, starting from WBInteraction000008637 and ending at WBInteraction000050578 (blank ids from WBInteraction000050579 to 000100000)
- Original .ACE file (in old .ACE interaction format) on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Files/27155_interaction.ace
- WBPaper00031465 (Lee's Nature Genetics paper) has 375,491 objects, starting from WBInteraction000100001, ending at WBInteraction000475491
- Original .ACE file (in old .ACE interaction format) on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Files/31465_interaction.ace
- The two large scale data sets have now been updated to the new interaction format (as of May 2013) and consolidated into a single file on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Large_Scale_Interactions_new_format.ace
The above file needs to be checked for dead genes before every upload by running the "historicGeneReplacementLSInteraction.pl" script in the same directory. Before running the script, make sure that the output file is appropriately named. The script prints all dead genes to the screen, so when running the script you may want to redirect the script output into a file so you can read the results later.
- assigning_interaction_ids
- textpresso_ggi
- interaction_ace_parsing
- oa_interactions_dumper
The new Interaction OA, March 2012
TAB1
- PGID Dumps as: N/A
- Interaction ID (Ontology) int_name Dumps as: Interaction: <Interaction ID>
- Curator - (Dropdown) int_curator Dumps as: N/A
- Process - ?WBProcess (MultiOntology) int_process Dumps as: WBProcess <WBProcess>
- Database, Field, & Accession Number - ?Database, Field, Accession_number (Free Text) int_database Dumps as: Database <Database> <Database_field> <Accession_number>
- For single entries, surround the Database, Field, and Accession number entries with double quotes and separate them with spaces like so: "Database" "Database Field" "Accession Number"
- If there are multiple entries, data to be entered like this: "Database 1" "Field 1" "Accession number 1" | "Database 2" "Field 2" "Accession number 2"
- Paper - ?Paper (Ontology) int_paper Dumps as: Paper <Paper>
- Interaction Type - Text (Multiple-Dropdown) int_type
- The options for Interaction Type will include:
- Physical Dumps as: Physical
- Predicted Dumps as: Predicted
- Genetic - Genetic interaction Dumps as: Genetic_interaction
- Genetic - Negative genetic Dumps as: Negative_genetic
- Genetic - Synthetic Dumps as: Synthetic
- Genetic - Enhancement Dumps as: Enhancement
- Genetic - Unilateral enhancement Dumps as: Unilateral_enhancement
- Genetic - Mutual enhancement Dumps as: Mutual_enhancement
- Genetic - Positive genetic Dumps as: Positive_genetic
- Genetic - Suppression Dumps as: Suppression
- Genetic - Complete suppression Dumps as: Complete_suppression
- Genetic - Partial suppression Dumps as: Partial_suppression
- Genetic - Unilateral suppression Dumps as: Unilateral_suppression
- Genetic - Complete unilateral suppression Dumps as: Complete_unilateral_suppression
- Genetic - Partial unilateral suppression Dumps as: Partial_unilateral_suppression
- Genetic - Mutual suppression Dumps as: Mutual_suppression
- Genetic - Complete mutual suppression Dumps as: Complete_mutual_suppression
- Genetic - Partial mutual suppression Dumps as: Partial_mutual_suppression
- Genetic - Asynthetic Dumps as: Asynthetic
- Genetic - Suppression/Enhancement Dumps as: Suppression_enhancement
- Genetic - Epistasis Dumps as: Epistasis
- Genetic - Positive epistasis Dumps as: Positive_epistasis
- Genetic - Maximal epistasis Dumps as: Maximal_epistasis
- Genetic - Minimal epistasis Dumps as: Minimal_epistasis
- Genetic - Neutral epistasis Dumps as: Neutral_epistasis
- Genetic - Qualitative epistasis Dumps as: Qualitative_epistasis
- Genetic - Opposing epistasis Dumps as: Opposing_epistasis
- Genetic - Quantitative epistasis Dumps as: Quantitative_epistasis
- Genetic - Neutral genetic Dumps as: Neutral_genetic
- Genetic - Oversuppression Dumps as: Oversuppression
- Genetic - Unilateral oversuppression Dumps as: Unilateral_oversuppression
- Genetic - Mutual oversuppression Dumps as: Mutual_oversuppression
- Genetic - Oversuppression/Enhancement Dumps as: Oversuppression_enhancement
- Genetic - Phenotype bias Dumps as: Phenotype_bias
- Genetic - No interaction Dumps as: No_interaction
- The options for Interaction Type will include:
- Interaction Summary - bigtext int_summary Dumps as: Interaction_summary <Big_Text>
- Remark - bigtext int_remark Dumps as: Remark <Big_Text>
Each interaction type can be considered necessarily directional, necessarily non-directional, or ambiguous. The OA dumping script will check to make sure that the correct interactor types (Non-directional, Effector, or Affected) are listed in each case and notify the curator during the dump in the error output file. Here are the directionalities for each interaction type:
Necessarily Directional:
- Enhancement
- Unilateral enhancement
- Suppression
- Complete suppression
- Partial suppression
- Unilateral suppression
- Complete unilateral suppression
- Partial unilateral suppression
- Epistasis
- Positive epistasis
- Maximal epistasis
- Minimal epistasis
- Neutral epistasis
- Qualitative epistasis
- Opposing epistasis
- Quantitative epistasis
- Oversuppression
- Unilateral oversuppression
- Phenotype_bias
Necessarily Non-directional:
- Predicted
- Synthetic
- Asynthetic
- Mutual enhancement
- Mutual suppression
- Complete mutual suppression
- Partial mutual suppression
- Mutual oversuppression
- Suppression enhancement
- Oversuppression enhancement
- No interaction
Ambiguous (ignore in dumping script):
- Physical
- Genetic interaction
- Negative genetic
- Positive genetic
- Neutral genetic
TAB2
- Physical interaction detection method (Multi-dropdown) int_detectionmethod
- The detection method options are:
- Affinity_capture_luminescence Dumps as: Affinity_capture_luminescence
- Affinity_capture_MS Dumps as: Affinity_capture_MS
- Affinity_capture_RNA Dumps as: Affinity_capture_RNA
- Affinity_capture_Western Dumps as: Affinity_capture_Western
- Chromatin_immunoprecipitation Dumps as: Chromatin_immunoprecipitation
- Cofractionation Dumps as: Cofractionation
- Colocalization Dumps as: Colocalization
- Copurification Dumps as: Copurification
- DNase_I_footprinting Dumps as: DNase_I_footprinting
- Fluorescence_resonance_energy_transfer Dumps as: Fluorescence_resonance_energy_transfer
- Protein_fragment_complementation_assay Dumps as: Protein_fragment_complementation_assay
- Yeast_two_hybrid Dumps as: Yeast_two_hybrid
- Biochemical_activity Dumps as: Biochemical_activity
- Cocrystal_structure Dumps as: Cocrystal_structure
- Far_western Dumps as: Far_western
- Protein_peptide Dumps as: Protein_peptide
- Protein_RNA Dumps as: Protein_RNA
- Reconstituted_complex Dumps as: Reconstituted_complex
- Yeast_one_hybrid Dumps as: Yeast_one_hybrid
- Directed_yeast_one_hybrid Dumps as: Directed_yeast_one_hybrid
- Electrophoretic_mobility_shift_assay Dumps as: Electrophoretic_mobility_shift_assay
- The detection method options are:
- Library screened/Times found - Text Text(Integer) int_library; separate multiple entries with pipes ('|') int_library Dumps as: Library_screened <Text> INT
- For single entries, surround the 'Library screened' entry with double quotes and separate the number with a space like so: "Library screened" 3
- For multiple entries, data should be entered as such: "Library screened 1" INT | "Library screened 2" INT
- From Laboratory - ?Laboratory (ontology) int_laboratory Dumps as: From_laboratory <Laboratory>
- From Company - Text; separate multiple entries with pipes ('|') int_company Dumps as: From_company <Text>
- PCR Bait - ?PCR_product (Ontology) int_pcrbait Dumps as: PCR_interactor <PCR_product> Bait
- PCR Target(s) - ?PCR_product (MultiOntology) int_pcrtarget Dumps as: PCR_interactor <PCR_product> Target
- Non-directional PCR(s) - ?PCR_product (MultiOntology) int_pcrnondir Dumps as: PCR_interactor <PCR_product> Non_directional
- Sequence Bait - ?Sequence (Free Text) int_sequencebait Dumps as: Sequence_interactor <Sequence> Bait
- Sequence Target(s) - ?Sequence (Free Text); separate multiple entries with pipes ('|') int_sequencetarget Dumps as: Sequence_interactor <Sequence> Target
- Non-directional Sequence(s) - ?Sequence (Free Text); separate multiple entries with pipes ('|') int_sequencenondir Dumps as: Sequence_interactor <Sequence> Non_directional
- Feature Bait - ?Feature (MultiOntology) int_featurebait Dumps as: Feature_interactor <Feature> Bait
- Feature Target - ?Feature (MultiOntology) int_featuretarget Dumps as: Feature_interactor <Feature> Target
- Bait overlapping CDS - ?CDS (Free Text) int_cdsbait Dumps as: Interactor_overlapping_CDS <CDS> Bait
- Target overlapping CDS(s) - ?CDS (Free Text); separate multiple entries with pipes ('|') int_cdstarget Dumps as: Interactor_overlapping_CDS <CDS> Target
- Non-directional overlapping CDS(s) - ?CDS (Free Text); separate multiple entries with pipes ('|') int_cdsnondir Dumps as: Interactor_overlapping_CDS <CDS> Non_directional
- Bait overlapping protein - ?Protein (Free Text) int_proteinbait Dumps as: Interactor_overlapping_protein <Protein> Bait
- Target overlapping protein(s) - ?Protein (Free Text); separate multiple entries with pipes ('|') int_proteintarget Dumps as: Interactor_overlapping_protein <Protein> Target
- Non-directional overlapping protein(s) - ?Protein (Free Text); separate multiple entries with pipes ('|') int_proteinnondir Dumps as: Interactor_overlapping_protein <Protein> Non_directional
- Bait overlapping gene - ?Gene (Ontology) int_genebait Dumps as: Interactor_overlapping_gene <Gene> Bait
- Target overlapping gene(s) - ?Gene (MultiOntology) int_genetarget Dumps as: Interactor_overlapping_gene <Gene> Target
- Antibody - ?Antibody (MultiOntology) int_antibody Dumps as: Interactor_overlapping_gene <Mapped Gene> Antibody <Antibody> AND Antibody (on new line)
- When mapping antibodies to genes, compare antibody-affiliated genes with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
- For Antibodies that don't map to a gene in the interaction, Dump as: Unaffiliated_antibody <Antibody>
- Antibody remark - Big Text int_antibodyremark Dumps as: Antibody_remark <Big_Text>
TAB3
- Non-directional Gene(s) - ?Gene (MultiOntology) int_genenondir Dumps as: Interactor_overlapping_gene <Gene> Non_directional
- Effector Gene(s) - ?Gene (MultiOntology) int_geneone Dumps as: Interactor_overlapping_gene <Gene> Effector
- Affected Gene(s) - ?Gene (MultiOntology) int_genetwo Dumps as: Interactor_overlapping_gene <Gene> Affected
- Non-directional Variation(s) - ?Variation (MultiOntology) int_variationnondir
- Dumps as: Variation_interactor <Variation> Non_directional
- Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
- Dumps as: Interactor_overlapping_gene <Mapped Gene> Non_directional
- Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
- Effector Variation(s) - ?Variation (MultiOntology) int_variationone
- Dumps as: Variation_interactor <Variation> Effector
- Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
- Dumps as: Interactor_overlapping_gene <Mapped Gene> Effector
- Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
- Affected Variation(s) - ?Variation (MultiOntology) int_variationtwo
- Dumps as: Variation_interactor <Variation> Affected
- Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
- Dumps as: Interactor_overlapping_gene <Mapped Gene> Affected
- Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
- Non-directional Molecule(s) - ?Molecule (MultiOntology) int_moleculenondir Dumps as: Molecule_interactor <Molecule> Non_directional
- Effector Molecule(s) - ?Molecule (MultiOntology) int_moleculeone Dumps as: Molecule_interactor <Molecule> Effector
- Affected Molecule(s) - ?Molecule (MultiOntology) int_moleculetwo Dumps as: Molecule_interactor <Molecule> Affected
TAB4
- Non-directional Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrnondir Dumps as: Rearrangement <Rearrangement> Non_directional
- Effector Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrone Dumps as: Rearrangement <Rearrangement> Effector
- Affected Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrtwo Dumps as: Rearrangement <Rearrangement> Affected
- Effector Other Type - (Dropdown) int_otheronetype options are: Chemical or Transgene, int_otheronetype Dumps as (see next line)
- Effector Other - ?Text int_otherone Dumps as: Remark "Effector <Effector Other Type>: <Text>"
- Affected Other Type - (Dropdown) int_othertwotype options are: Chemical or Transgene, int_othertwotype Dumps as (see next line)
- Affected Other - ?Text int_othertwo Dumps as: Remark "Affected <Affected Other Type>: <Text>"
- RNAi - (MultiOntology) int_rnai Dumps as: Interaction_RNAi <RNAi>
- Large scale RNAi - Free Text; separate multiple entries with pipes ('|') int_lsrnai (all large scale RNAi that doesn't match ontology) Dumps as: Interaction_RNAi <RNAi>
- Interaction phenotype(s) - ?Phenotype (MultiOntology) int_phenotype Dumps as: Interaction_phenotype <Phenotype>
- Expression pattern(s) - ?Expr_pattern (MultiOntology) int_exprpattern Dumps as: Interactor_overlapping_gene <Mapped Gene> Expr_pattern <Expr_pattern>
- When mapping Expression patterns to genes, compare Expr-affiliated genes with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
- For Expression patterns that don't map to a gene in the interaction, Dump as: Unaffiliated_expr_pattern <Expr_pattern>
- Transgene(s) - ?Transgene (MultiOntology) int_transgene Dumps as: Interactor_overlapping_gene <Mapped Gene> Transgene <Transgene> AND Transgene (on new line)
- When mapping transgenes to genes, compare transgene-affiliated genes (from the Driven_by_gene, Gene, and 3'UTR fields) with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
- For Transgenes that don't map to a gene in the interaction, Dump as: Unaffiliated_transgene <Transgene>
TAB5
- Person - ?Person int_person Dumps as: Remark <Remark_text> Person_evidence <Person>
- If there is no Remark entry, dumps as: Remark "See Person Evidence" Person_evidence <Person>
- Confidence description - Text int_confidence Dumps as: Description <Text>
- P-value - Text (Float) int_pvalue Dumps as: P_value FLOAT
- Log-likelihood score - Text (Float) int_loglikelihood Dumps as: Log_likelihood_score FLOAT
- High_throughput - (Toggle) int_throughput:
- If ON, dumps as: High_throughput
- If OFF (default), dumps as: Low_throughput
- Sentence ID - (Ontology) sentence shows in term info; int_sentid Dumps as: N/A
- False Positive - toggle, will not give an id or no dump if the sentence is false positive, containing no interaction info; int_falsepositive Dumps as: N/A
To go live on tazendra
To create new interaction tables on tazendra : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/create_datatype_tables.pl
Backup relevant tables : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/backupTable.pl
To transfer data from old interaction model tables to new interaction model tables and table formats : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/transfer_int_data.pl
Instructions on How to Use the New OA
The new Interaction OA is intended to be used by curators for curating physical, predicted and genetic interactions. This is an overview of the key points to keep in mind while curating with the Interaction OA.
1) TAB 1 is for general information, TAB 2 is for physical interactions, TAB 3 & TAB 4 are for genetic and predicted interactions, and TAB5 is for detailed (and rarely used) information
TAB 1
2) Interaction IDs are generated automatically when the "New" button is clicked; new Postgres IDs (PGIDs) are generated automatically as well. If you would like to duplicate objects (because you are generating several similar interaction objects) and assign new IDs to them, select the interaction to duplicate and click the "Duplicate" button; this will generate a new PGID and OA row, but carry the same Interaction ID over from the duplicated interaction. If the new row should be a distinct interaction, delete the interaction ID from the Interaction ID field, and a cron job will assign an Interaction ID to that row the following evening. Note that if you leave any rows without an Interaction ID overnight, they will be each assigned a unique (and new) Interaction ID.
3) Entering Database information: Database information is typically provided with three pieces of information: the Database, the Database field name, and the Database Accession number for the interaction in question. These must be each entered surrounded by double quotes (") and separated with spaces. So for example:
"Database" "Database Field" "Accession Number"
If mulitiple database references are to be made, split on pipes ("|") like this:
"Database 1" "Database Field 1" "Accession Number 1" | "Database 2" "Database Field 2" "Accession Number 2"
In the (unlikely) event that any of the field entries have double quotes (") in the name itself, then the 'inner' double quotes will need to be escaped with a backslash ("\") like this:
"Database \"Supreme!!!!\"" "Database \"Field\"" "Accession \"Number\""
so that it will be read in ACEDB as:
Database "Supreme!!!!" Database "Field" Accession "Number"
4) The Person field is for Person evidence, when no other reference (such as a WormBase paper) is supplied as a reference. This will be dumped as a hash/supplement to the "Remark" entry, so it is advantageous to include any pertinent information there.
TAB 2
5) The "Library screens and Times found" field is for documenting screening/testing libraries that were used to identify a physical interaction. For example, a cDNA library may be used in a Yeast Two Hybrid screen to identify protein interaction partners with a "Bait" protein. Sometimes (but not always) authors might report the number of times a particular interaction was identified using a particular library. If not, enter the name of the library with double quotes (") like so:
"cDNA"
if multiple libraries (but no numbers), split on pipes like this:
"cDNA" | "ORFeome"
If a single libary, with a number (for number of times found):
"AD-TF mini-library" 5
and if multiple libraries, with numbers:
"AD-TF mini-library" 5 | "AD-wrmcDNA library" 1
As with the Database field entries, if (for some reason) the name of the library has double quotes (") in the name itself, the double quotes will need to be escaped like this:
"AD-TF \"Mini\" library" 5
so that the library name eventually reads like this:
AD-TF "Mini" library
6) Sequence fields: "Sequence Bait", "Sequence Target(s)", and "Non-directional Sequence(s)"
To enter a single sequence object, type in the sequence object name, no quotes:
CK583862
For multiple sequence objects, split on pipes ("|"):
CK583862 | CK583870
The same rules apply for Protein and CDS objects.
7) The Non-directional fields; For each type of interaction object, there is a "Non-directional" field, allowing a curator to enter all interactors of that type for a Non-directional type of interaction. Note that the "Non-directional Genes" field (which could apply to physical, genetic, and predicted interactions) lies in TAB 3.
8) For Directional interactions, there are "Bait" and "Target" fields for physical interactions (TAB 2) and "Effector" and "Affected" fields for genetic interactions (TAB 3).
TAB 3
9) Effector Variation(s), Affected Variation(s), and Non-directional Variation(s) fields are for variations that implicate an affiliated gene as an effector or affected interactor in a directional genetic interaction or non-directional interactors in a non-directional interaction. These fields can be populated without the need to populate the respective gene in the relevant gene field, as the dumping script (or build process) will make the appropriate associations.
TAB 4
10) The RNAi fields: There are two fields for references to RNAi objects: "RNAi" and "Large scale RNAi". The reason for two fields is that one field ("RNAi") is an ontology field reading off of RNAi experiments that live in the RNAi OA and Postgres. As RNAi experiments from papers containing 2,000 or more RNAi experiments (WBPaper00029258 for example) were excluded from the RNAi OA for performance reasons, any RNAi experiments from such papers will not be recognized by the "RNAi" ontology field, and therefore must be entered as free-text in the "Large scale RNAi" field.
11) The Transgene(s) field is for transgenes involved in the interaction, regardless of whether it is related to an "Effector" gene or "Affected" gene or the interaction is Non-directional. The dumping script will automatically associate the transgene with the appropriate interacting gene and dump in .ACE format accordingly.
12) Effector/Affected Other Type and Effector/Affected Other fields: these fields allow for the curation of Chemicals, Transgenes, or other entities that don't exists as proper WormBase/ACEDB objects, for example transgenes that only express human proteins. The "Other Type" fields allow for the selection of "Chemical" or "Transgene", the identity of which would go into the "Other" fields. This, ideally, will get phased out as chemicals are generated in the Molecule OA and transgenes in the Transgene OA, thereby allowing them to be entered into ontology-based "Transgene" or "Molecule" fields.
TAB 5
13) Confidence description: This field will capture free-text descriptions of the confidence the authors suggest they have for this interaction. In the Yeast Two Hybrid experiments, for example, the interaction may be described as "Interactome Core 1", "Interactome Core 2", or "Interactome Noncore" referring to the varying degrees of confidence for those interactions.
14) The P-value and Log-likelihood score fields are mostly to capture confidence values for predicted interactions that have been reported.
15) High_throughput toggle field is intended to capture whether or not the interaction was observed as one of several (50 - 1000s) interactions and thus should be interpreted with caution, or at least acknowledged as from a large scale experiment. The default is OFF and indicates that the experiment is low throughput.
16) The Sentence ID and False Positve fields are exclusively for Textpresso sentence-based curation
Nightly Cron Job to Assign New Interaction IDs
Every night at 4am a cron job script will run to assign new Interaction IDs to any row/PGID in the Interaction OA that does not already have an Interaction ID and that meets a few criteria. The script is located on Tazendra here:
/home/acedb/xiaodong/assigning_interaction_ids/assign_interaction_ids.pl
and the criteria for getting a new Interaction ID are as follows:
1) The curator of the interaction object/PGID is NOT Arun
2) The interaction object/PGID does NOT already have an Interaction ID
3) The interaction object/PGID is NOT flagged as False Positive
Any interactions/PGIDs that meet these three criteria will be assigned a new Interaction ID by the cron job.
Interaction OA .ACE Dumper
The script for the interaction OA dumper is located on Tazendra at:
/home/postgres/work/citace_upload/interaction/use_package.pl*
Error Checks During Dump Process
The following is a list of checks that the .ACE dumper script will perform on all interactions being dumped out of the OA to make sure that the data is consistent and doesn't have any nonsensical information:
Fatal Errors (Interactions will not get dumped)
1) If there are fewer than two interactors in an interaction, the dumper script will generate an error message that is printed to the ERROR output file and the object will not get dumped. This is determined by checking that:
a) There is at least one "Bait" entry and one "Target" entry OR
b) There is at least one "Effector" and one "Affected" entry OR
c) There is at least two "Non-directional" entries
If none of these conditions hold true, then an error message will be printed in tab-delimited format like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 There are not two interactors
2) If there is no reference (Paper or Person) then the object will not get dumped and an error message is printed:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 There is no reference, neither paper nor person
3) If there are incompatible interactor types, the interaction object will not get dumped. This means that the object will not get dumped if the following conditions are not met:
a) If there is a "Non-directional" entry, there are no "Effector", "Affected", "Bait", or "Target" entries AND
b) If there is an "Effector" entry, there is at least one "Affected" entry AND there are no "Non-directional", "Bait", or "Target" entries AND
c) If there is an "Affected" entry, there is at least one "Effector" entry AND there are no "Non-directional", "Bait", or "Target" entries AND
d) If there is a "Bait" entry, there is at least one "Target" entry AND there are no "Non-directional", "Effector", or "Affected" entries AND
e) If there is a "Target" entry, there is at least one "Bait" entry AND there are no "Non-directional", "Effector", or "Affected" entries
If these conditions are not met, the object will not get dumped and an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 has nondiretional + bait 12345 nodump WBPerson1234 has nondiretional + target 12345 nodump WBPerson1234 has nondiretional + effected 12345 nodump WBPerson1234 has nondiretional + effector 12345 nodump WBPerson1234 has effector but no effected 12345 nodump WBPerson1234 has effector + bait 12345 nodump WBPerson1234 has effector + target 12345 nodump WBPerson1234 has effected + bait 12345 nodump WBPerson1234 has effected + target 12345 nodump WBPerson1234 has bait but no target
4) If there is no Interaction ID, the object will not get dumped and an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 There is no Interaction ID
The script will determine this by 1) generating a list of all PGIDs from the Interaction OA, 2) Removing all PGIDs where Arun is the curator, and then 3) looking for any PGIDs for which there is no Interaction ID. As there is a cronjob to add Interaction IDs to any PGIDs (Postgres rows) that are missing IDs, this problem should be rare (unless objects have been added that day (before the next cronjob) without Interaction IDs).
5) If there is no Interaction Type, the object will not get dumped and an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 There is no Interaction Type
6) If there is an Interaction object that exists on multiple Postgres lines/rows (i.e. the same Interaction ID with multiple PGIDs), the object will not get dumped and an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 nodump WBPerson1234 WBInteraction000123456 exists across multiple lines
Non-Fatal Errors (Interactions will get dumped, but error message will get printed)
1) If there is a Variation, Expression pattern, Transgene, or Antibody that cannot be matched to a gene interactor, the object will be identified as "Unaffiliated" in the .ACE file and an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Interaction ID <TAB> Unaffiliated Object <TAB> Object Name
so, for example:
12345 lineonly WBPerson1234 WBInteraction000123456 Unaffiliated_variation WBVar00600763 12345 lineonly WBPerson1234 WBInteraction000123456 Unaffiliated_transgene kyEx456 12345 lineonly WBPerson1234 WBInteraction000123456 Unaffiliated_antibody [cgc2826]:hlh-2 12345 lineonly WBPerson1234 WBInteraction000123456 Unaffiliated_expr_pattern Expr1234
2) If there is an inconsistency between the directionality of the Interaction Type and the Interactors, the interaction object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 flagonly WBPerson1234 has diretional type Enhancement + nondirectional data 12345 flagonly WBPerson1234 has diretional type Epistasis + nondirectional data 12345 flagonly WBPerson1234 has diretional type Suppression + nondirectional data 12345 flagonly WBPerson1234 has nondiretional type Mutual_enhancement + effected data 12345 flagonly WBPerson1234 has nondiretional type Mutual_enhancement + effector data 12345 flagonly WBPerson1234 has nondiretional type Mutual_suppression + effector data 12345 flagonly WBPerson1234 has nondiretional type No_interaction + effected data 12345 flagonly WBPerson1234 has nondiretional type Synthetic + effector data
3) If no curator is listed for the interaction, the interaction object will get dumped, but an error message will print to the ERROR output file like this:
PGID <TAB> Dump_status <TAB> Curator ID <TAB> Explanation
so, for example:
12345 flagonly no curator has no curator
Handling Dead Genes During Dump Process
The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:
1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:
Gene "WBGene00001234"
becomes
Gene "WBGene00002345" Inferred_automatically Historical_gene "WBGene00001234" "Note: This object originally referred to WBGene00001234. WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has replaced WBGene00001234 accordingly."
Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.
2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:
Gene "WBGene00001234" Historical_gene "WBGene00001234" "Note: This object originally referred to a gene (WBGene00001234) that is now considered dead. Please interpret with discretion."
OR
Gene "WBGene00001234" Historical_gene "WBGene00001234" "Note: This object originally referred to a gene (WBGene00001234) that has been suppressed. Please interpret with discretion."
and lastly,
3) If the gene has undergone a split, such genes will be dumped as:
Gene "WBGene00001234" Historical_gene "WBGene00001234" "Note: This object originally referred to a gene (WBGene00001234) that is now considered split. Please interpret with discretion."
and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.
Gene Examples:
A split gene: WBGene00012507
A merged gene: WBGene00007524
A dead gene: WBGene00007814
A suppressed gene: WBGene00015490