Gene Interaction

From WormBaseWiki
Revision as of 19:16, 27 March 2015 by Cgrove (talk | contribs) (→‎TAB3)
Jump to navigationJump to search

'links to relevant pages
Caltech documentation

Archived Interaction documentation may be found here

OA link


Interaction Curation

Pipeline

dump .ace file from OA for upload

  • on tazendra: /home/acedb/xiaodong/oa_interactions_dumper
  • run script by calling: ./use_package.pl
  • error file will be spitted out in the same directory after every run. inform curators to check the errors.
  • Old dumper outputs (err* and interaction* files) will now be archived in the following directory:
    • /home/acedb/xiaodong/oa_interactions_dumper/Interaction_OA_Dumper_Output_Archive -- CG 10-31-2012

Interaction Models

Current Models

The current ?Interaction model now consolidates ?Gene_regulation objects, ?YH objects, and ?Interaction objects into a single class, ?Interaction. The proposed ?Physical_interaction model has also been consolidated into this larger model. Note that #Interaction_info and #Interaction_type have been deprecated.

Updated for WS249 - CG 3-18-2015

?Interaction	Interaction_type Physical
				 Predicted
				 Regulatory Change_of_localization
					    Change_of_expression_level
				 Genetic Genetic_interaction
					 Neutral_genetic
					 Synthetic
					 Enhancement
					 Unilateral_enhancement
					 Mutual_enhancement
					 Positive_genetic
					 Suppression
					 Complete_suppression
					 Partial_suppression
					 Unilateral_suppression
					 Complete_unilateral_suppression
					 Partial_unilateral_suppression
					 Mutual_suppression
					 Complete_mutual_suppression
					 Partial_mutual_suppression
					 Suppression_enhancement
					 Asynthetic
					 Negative_genetic
					 Oversuppression	
					 Unilateral_oversuppression
					 Mutual_oversuppression		
					 Oversuppression_enhancement
       					 Phenotype_bias				
					 No_interaction				// Negative data; no interaction was observed after testing
					 Epistasis
					 Positive_epistasis
					 Maximal_epistasis
					 Minimal_epistasis
					 Neutral_epistasis
					 Qualitative_epistasis
					 Opposing_epistasis
					 Quantitative_epistasis
		Interactor PCR_interactor ?PCR_product  XREF Interaction #Interactor_info	// PCR_product of the interacting gene or protein, e.g. Yeast Two Hybrid experiments
			   Sequence_interactor ?Sequence  XREF Interaction #Interactor_info	// Sequence of the interacting gene or protein
			   Feature_interactor ?Feature  XREF Associated_with_Interaction  #Interactor_info
			   Interactor_overlapping_CDS ?CDS XREF Interaction #Interactor_info	// CDS of the interacting gene or protein (or related sequence)
			   Interactor_overlapping_gene ?Gene XREF Interaction #Interactor_info		// Gene (or portion of gene) involved in the interaction
			   Interactor_overlapping_protein ?Protein XREF Interaction #Interactor_info	// Protein (or portion of protein) involved in the interaction
			   Molecule_interactor ?Molecule XREF Interaction #Interactor_info	  // Molecule that interacts with a gene or protein (ported from Gene_regulation Class)
			   Other_regulator ?Text #Interactor_info	// Free text describing a regulator entity or condition that does not fall into a standard WormBase category
			   Other_regulated ?Text #Interactor_info	// Free text describing a regulated entity or condition that does not fall into a standard WormBase category
			   Rearrangement  ?Rearrangement XREF Interactor #Interactor_info
			   Variation_interactor  ?Variation  XREF  Interactor  #Interactor_info   // Added WS248; allele involved in genetic interaction
		Interaction_summary ?Text #Evidence
		Detection_method Affinity_capture_luminescence		      // A physical interaction detection technique
				 Affinity_capture_MS			      // A physical interaction detection technique
				 Affinity_capture_RNA			      // A physical interaction detection technique
				 Affinity_capture_Western		      // A physical interaction detection technique
				 Chromatin_immunoprecipitation	      	      // A physical interaction detection technique
				 Cofractionation		      	      // A physical interaction detection technique
				 Colocalization				      // A physical interaction detection technique
				 Copurification				      // A physical interaction detection technique
				 DNase_I_footprinting    		      // A physical interaction detection technique
				 Fluorescence_resonance_energy_transfer	      // A physical interaction detection technique
				 Protein_fragment_complementation_assay	      // A physical interaction detection technique
				 Yeast_two_hybrid			      // A physical interaction detection technique (Protein-protein)
				 Biochemical_activity			      // A physical interaction detection technique
				 Cocrystal_structure			      // A physical interaction detection technique
				 Far_western				      // A physical interaction detection technique
				 Protein_peptide 			      // A physical interaction detection technique
				 Protein_RNA 				      // A physical interaction detection technique
				 Reconstituted_complex			      // A physical interaction detection technique
				 Electrophoretic_mobility_shift_assay         // A physical interaction detection technique
				 Yeast_one_hybrid			      // A physical interaction detection technique (Protein-DNA)
				 Directed_yeast_one_hybrid		      // A physical interaction detection technique (Protein-DNA)
				 Antibody				      // A regulatory interaction detection technique; Antibody name and details captured in Interactor_info hash
				 Reporter_gene ?Text			      // A regulatory interaction detection technique
				 Transgene				      // A regulatory interaction detection technique; Trasnsgene name and details captured in Interactor_info hash
				 In_situ Text				      // A regulatory interaction detection technique
				 Northern Text				      // A regulatory interaction detection technique
				 Western Text				      // A regulatory interaction detection technique
				 RT_PCR Text				      // A regulatory interaction detection technique
				 Other_method ?Text			      // A regulatory interaction detection technique
		Library_info Library_screened Text INT			  // In the context of Y2H or YH screens, for example, the library may have been cDNA library or a pool of clones
			     Origin From_laboratory UNIQUE ?Laboratory 	  // A library generated at an academic laboratory
				    From_company UNIQUE ?Text		  // A library generated at a company
		Regulation_level Transcriptional		          // Regulation occurs at the transcriptional level
				 Post_transcriptional		          // Regulation occurs at the post-transcriptional level
				 Post_translational		          // Regulation occurs at the post-translational level
		Regulation_result Positive_regulate #GR_condition
				  Negative_regulate #GR_condition
				  Does_not_regulate #GR_condition // added to capture negative data [040220 krb]
		Confidence Description Text			  // Free text description of the confidence, e.g. "Core" vs "Noncore" (Vidal Interactome terms)
			   P_value UNIQUE Float			  // P-value confidence of interaction, if given
			   Log_likelihood_score UNIQUE Float	  // Only used for Predicted interactions
		Throughput UNIQUE High_throughput		  //See BioGRID curation criteria for discussion
				  Low_throughput
		Interaction_RNAi ?RNAi XREF Interaction			// RNAi experiment associated with the interaction
		Interaction_phenotype ?Phenotype XREF Interaction	// Phenotype associated with a genetic interaction
		Unaffiliated_variation   ?Variation
		Unaffiliated_transgene   ?Transgene
		Unaffiliated_antibody    ?Antibody
		Unaffiliated_expr_pattern   ?Expr_pattern
		WBProcess ?WBProcess XREF Interaction			// WormBase biological process associated with the interaction
		DB_info Database ?Database ?Database_field ?Text	// Any database reference to the interaction outside of WormBase, e.g. BioGRID, Interactome
		Paper ?Paper XREF Interaction
		Antibody_remark ?Text
		Historical_gene   ?Gene Text
		Remark ?Text #Evidence



#Interactor_info       Interactor_type Non_directional		   // An interactor that has no inherent directionality
				       Bait			   // The interactor of interest or focus; the focus/starting point of an interaction screen
				       Target			   // The discovered interactor; interactors found as a result of an interaction screen
				       Effector			   // In a genetic interaction, the perturbation that affects the phenotype of the other perturbation
				       Affected			   // In a genetic interaction, the perturbation whose phenotype is affected by the other perturbation
				       Trans_regulator		   // A trans-acting regulator, e.g. a transcription factor
				       Cis_regulator		   // A cis-acting regulator, e.g. an enhancer element
				       Trans_regulated		   // A gene regulated in trans, e.g. by a transcription factor
				       Cis_regulated		   // A gene regulated in cis, e.g. by an enhancer element
		       Expr_pattern ?Expr_pattern		   // An expression pattern altered to indicate 
		       Transgene ?Transgene XREF Interactor	   // Transgene XREF Interactor that carries an interacting gene //removed XREF from proposal
		       Construct ?Construct XREF Interactor
		       Antibody ?Antibody XREF Interactor	   // Free text description of the antibody used to detect a regulation event //removed XREF from proposal
		       Inferred_automatically Text                 // a script has updated this connection


In the adoption of this new ?Interaction model in WS231, we have consolidated the ?YH and ?Gene_regulation class into the ?Interaction class. As of WS231, there were 3831 ?Gene_regulation objects, which were then converted to ?Interaction objects with IDs WBInteraction000501384 through WBInteraction000505214. As of WS231, there were 11,993 ?YH objects, which were then converted to ?Interaction objects with IDs WBInteraction000505215 through WBInteraction000517207.

Gene_gene Interaction OA

Postgres Table Names

'Interaction ID'               = int_name

'Curator' = int_curator

'Process' = int_process

'Database Field Accession Number' = int_database

'Paper' = int_paper

'Interaction Type'             = int_type

'Interaction Summary' = int_summary

'Remark' = int_remark

'Physical interaction detection method' = int_detectionmethod

'Library screened and times found' = int_library

'From Laboratory' = int_laboratory

'From Company' = int_company

'PCR Bait' = int_pcrbait

'PCR Target(s)' = int_pcrtarget

'Non-directional PCR(s) = int_pcrnondir

'Sequence Bait' = int_sequencebait

'Sequence Target(s)' = int_sequencetarget

'Non-directional Sequence(s)' = int_sequencenondir

'Feature Bait' = int_featurebait

'Feature Target' = int_featuretarget

'Bait overlapping CDS' = int_cdsbait

'Target overlapping CDS(s)' = int_cdstarget

'Non-directional overlapping CDS(s)' = int_cdsnondir

'Bait overlapping protein' = int_proteinbait

'Target overlapping protein' = int_proteintarget

'Non-directional overlapping protein' = int_proteinnondir

'Bait overlapping gene' = int_genebait

'Target overlapping gene' = int_genetarget

'Antibody' = int_antibody

'Antibody remark' = int_antibodyremark

'Non-directional Gene(s)' = int_genenondir

'Effector Gene(s)'             = int_geneone

'Affected Gene(s)' = int_genetwo

'Non-directional Rearrangement(s)' = int_rearrnondir

'Effector Rearrangement(s)' = int_rearrone

'Affected Rearrangement(s)' = int_rearrtwo

'Effector Other Type' = int_otheronetype

'Effector Other' = int_otherone

'Affected Other Type' = int_othertwotype

'Affected Other' = int_othertwo

'Large scale RNAi' = int_lsrnai

'RNAi' = int_rnai

'Interaction Phenotype(s)' = int_phenotype

'Expression pattern(s)' = int_exprpattern

'Non-directional Variation(s)' = int_variationnondir

'Effector Variation(s)'        = int_variationone

'Affected Variation(s)' = int_variationtwo

'Non-directional Molecule(s)' = int_moleculenondir

'Effector Molecule(s)' = int_moleculeone

'Affected Molecule(s)' = int_moleculetwo

'Transgene(s)' = int_transgene

'Person'                       = int_person  

'Confidence description' = int_confidence

'P-value' = int_pvalue

'Log-likelihood score' = int_loglikelihood

'High_hroughput' = int_throughput

'Sentence ID'                  = int_sentid        

'False Positive'               = int_falsepositive

Some Notes for gene_gene_interaction

Two Large Scale Interaction Data Sets

  • Files and scripts for these large scale datasets have been moved into a new directory:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions

-- CG 10-31-2012


  • WBPaper00027155 (Weiwei's science paper) has 23,128 objects, starting from WBInteraction000008637 and ending at WBInteraction000050578 (blank ids from WBInteraction000050579 to 000100000)
    • Original .ACE file (in old .ACE interaction format) on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Files/27155_interaction.ace
  • WBPaper00031465 (Lee's Nature Genetics paper) has 375,491 objects, starting from WBInteraction000100001, ending at WBInteraction000475491
    • Original .ACE file (in old .ACE interaction format) on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Files/31465_interaction.ace
  • The two large scale data sets have now been updated to the new interaction format (as of May 2013) and consolidated into a single file on Tazendra here:
/home/acedb/xiaodong/oa_interactions_dumper/Large_Scale_Interactions/Original_Large_Scale_Interactions_new_format.ace

The above file needs to be checked for dead genes before every upload by running the "historicGeneReplacementLSInteraction.pl" script in the same directory. Before running the script, make sure that the output file is appropriately named. The script prints all dead genes to the screen, so when running the script you may want to redirect the script output into a file so you can read the results later.


Directories on tazendra related to gene_gene_interaction (home/acedb/xiaodong)

  • assigning_interaction_ids
  • textpresso_ggi
  • interaction_ace_parsing
  • oa_interactions_dumper

The new Interaction OA, March 2012

TAB1

  • PGID Dumps as: N/A
  • Interaction ID (Ontology) int_name Dumps as: Interaction: <Interaction ID>
  • Curator - (Dropdown) int_curator Dumps as: N/A
  • Process - ?WBProcess (MultiOntology) int_process Dumps as: WBProcess <WBProcess>
  • Database, Field, & Accession Number - ?Database, Field, Accession_number (Free Text) int_database Dumps as: Database <Database> <Database_field> <Accession_number>
    • For single entries, surround the Database, Field, and Accession number entries with double quotes and separate them with spaces like so: "Database" "Database Field" "Accession Number"
    • If there are multiple entries, data to be entered like this: "Database 1" "Field 1" "Accession number 1" | "Database 2" "Field 2" "Accession number 2"
  • Paper - ?Paper (Ontology) int_paper Dumps as: Paper <Paper>
  • Interaction Type - Text (Multiple-Dropdown) int_type
    • The options for Interaction Type will include:
      • Physical Dumps as: Physical
      • Predicted Dumps as: Predicted
      • Genetic - Genetic interaction Dumps as: Genetic_interaction
      • Genetic - Negative genetic Dumps as: Negative_genetic
      • Genetic - Synthetic Dumps as: Synthetic
      • Genetic - Enhancement Dumps as: Enhancement
      • Genetic - Unilateral enhancement Dumps as: Unilateral_enhancement
      • Genetic - Mutual enhancement Dumps as: Mutual_enhancement
      • Genetic - Positive genetic Dumps as: Positive_genetic
      • Genetic - Suppression Dumps as: Suppression
      • Genetic - Complete suppression Dumps as: Complete_suppression
      • Genetic - Partial suppression Dumps as: Partial_suppression
      • Genetic - Unilateral suppression Dumps as: Unilateral_suppression
      • Genetic - Complete unilateral suppression Dumps as: Complete_unilateral_suppression
      • Genetic - Partial unilateral suppression Dumps as: Partial_unilateral_suppression
      • Genetic - Mutual suppression Dumps as: Mutual_suppression
      • Genetic - Complete mutual suppression Dumps as: Complete_mutual_suppression
      • Genetic - Partial mutual suppression Dumps as: Partial_mutual_suppression
      • Genetic - Asynthetic Dumps as: Asynthetic
      • Genetic - Suppression/Enhancement Dumps as: Suppression_enhancement
      • Genetic - Epistasis Dumps as: Epistasis
      • Genetic - Positive epistasis Dumps as: Positive_epistasis
      • Genetic - Maximal epistasis Dumps as: Maximal_epistasis
      • Genetic - Minimal epistasis Dumps as: Minimal_epistasis
      • Genetic - Neutral epistasis Dumps as: Neutral_epistasis
      • Genetic - Qualitative epistasis Dumps as: Qualitative_epistasis
      • Genetic - Opposing epistasis Dumps as: Opposing_epistasis
      • Genetic - Quantitative epistasis Dumps as: Quantitative_epistasis
      • Genetic - Neutral genetic Dumps as: Neutral_genetic
      • Genetic - Oversuppression Dumps as: Oversuppression
      • Genetic - Unilateral oversuppression Dumps as: Unilateral_oversuppression
      • Genetic - Mutual oversuppression Dumps as: Mutual_oversuppression
      • Genetic - Oversuppression/Enhancement Dumps as: Oversuppression_enhancement
      • Genetic - Phenotype bias Dumps as: Phenotype_bias
      • Genetic - No interaction Dumps as: No_interaction
  • Interaction Summary - bigtext int_summary Dumps as: Interaction_summary <Big_Text>
  • Remark - bigtext int_remark Dumps as: Remark <Big_Text>


Each interaction type can be considered necessarily directional, necessarily non-directional, or ambiguous. The OA dumping script will check to make sure that the correct interactor types (Non-directional, Effector, or Affected) are listed in each case and notify the curator during the dump in the error output file. Here are the directionalities for each interaction type:

Necessarily Directional:

  • Enhancement
  • Unilateral enhancement
  • Suppression
  • Complete suppression
  • Partial suppression
  • Unilateral suppression
  • Complete unilateral suppression
  • Partial unilateral suppression
  • Epistasis
  • Positive epistasis
  • Maximal epistasis
  • Minimal epistasis
  • Neutral epistasis
  • Qualitative epistasis
  • Opposing epistasis
  • Quantitative epistasis
  • Oversuppression
  • Unilateral oversuppression
  • Phenotype_bias


Necessarily Non-directional:

  • Predicted
  • Synthetic
  • Asynthetic
  • Mutual enhancement
  • Mutual suppression
  • Complete mutual suppression
  • Partial mutual suppression
  • Mutual oversuppression
  • Suppression enhancement
  • Oversuppression enhancement
  • No interaction


Ambiguous (ignore in dumping script):

  • Physical
  • Genetic interaction
  • Negative genetic
  • Positive genetic
  • Neutral genetic

TAB2

  • Physical interaction detection method (Multi-dropdown) int_detectionmethod
    • The detection method options are:
      • Affinity_capture_luminescence Dumps as: Affinity_capture_luminescence
      • Affinity_capture_MS Dumps as: Affinity_capture_MS
      • Affinity_capture_RNA Dumps as: Affinity_capture_RNA
      • Affinity_capture_Western Dumps as: Affinity_capture_Western
      • Chromatin_immunoprecipitation Dumps as: Chromatin_immunoprecipitation
      • Cofractionation Dumps as: Cofractionation
      • Colocalization Dumps as: Colocalization
      • Copurification Dumps as: Copurification
      • DNase_I_footprinting Dumps as: DNase_I_footprinting
      • Fluorescence_resonance_energy_transfer Dumps as: Fluorescence_resonance_energy_transfer
      • Protein_fragment_complementation_assay Dumps as: Protein_fragment_complementation_assay
      • Yeast_two_hybrid Dumps as: Yeast_two_hybrid
      • Biochemical_activity Dumps as: Biochemical_activity
      • Cocrystal_structure Dumps as: Cocrystal_structure
      • Far_western Dumps as: Far_western
      • Protein_peptide Dumps as: Protein_peptide
      • Protein_RNA Dumps as: Protein_RNA
      • Reconstituted_complex Dumps as: Reconstituted_complex
      • Yeast_one_hybrid Dumps as: Yeast_one_hybrid
      • Directed_yeast_one_hybrid Dumps as: Directed_yeast_one_hybrid
      • Electrophoretic_mobility_shift_assay Dumps as: Electrophoretic_mobility_shift_assay
  • Library screened/Times found - Text Text(Integer) int_library; separate multiple entries with pipes ('|') int_library Dumps as: Library_screened <Text> INT
    • For single entries, surround the 'Library screened' entry with double quotes and separate the number with a space like so: "Library screened" 3
    • For multiple entries, data should be entered as such: "Library screened 1" INT | "Library screened 2" INT
  • From Laboratory - ?Laboratory (ontology) int_laboratory Dumps as: From_laboratory <Laboratory>
  • From Company - Text; separate multiple entries with pipes ('|') int_company Dumps as: From_company <Text>
  • PCR Bait - ?PCR_product (Ontology) int_pcrbait Dumps as: PCR_interactor <PCR_product> Bait
  • PCR Target(s) - ?PCR_product (MultiOntology) int_pcrtarget Dumps as: PCR_interactor <PCR_product> Target
  • Non-directional PCR(s) - ?PCR_product (MultiOntology) int_pcrnondir Dumps as: PCR_interactor <PCR_product> Non_directional
  • Sequence Bait - ?Sequence (Free Text) int_sequencebait Dumps as: Sequence_interactor <Sequence> Bait
  • Sequence Target(s) - ?Sequence (Free Text); separate multiple entries with pipes ('|') int_sequencetarget Dumps as: Sequence_interactor <Sequence> Target
  • Non-directional Sequence(s) - ?Sequence (Free Text); separate multiple entries with pipes ('|') int_sequencenondir Dumps as: Sequence_interactor <Sequence> Non_directional
  • Feature Bait - ?Feature (MultiOntology) int_featurebait Dumps as: Feature_interactor <Feature> Bait
  • Feature Target - ?Feature (MultiOntology) int_featuretarget Dumps as: Feature_interactor <Feature> Target
  • Bait overlapping CDS - ?CDS (Free Text) int_cdsbait Dumps as: Interactor_overlapping_CDS <CDS> Bait
  • Target overlapping CDS(s) - ?CDS (Free Text); separate multiple entries with pipes ('|') int_cdstarget Dumps as: Interactor_overlapping_CDS <CDS> Target
  • Non-directional overlapping CDS(s) - ?CDS (Free Text); separate multiple entries with pipes ('|') int_cdsnondir Dumps as: Interactor_overlapping_CDS <CDS> Non_directional
  • Bait overlapping protein - ?Protein (Free Text) int_proteinbait Dumps as: Interactor_overlapping_protein <Protein> Bait
  • Target overlapping protein(s) - ?Protein (Free Text); separate multiple entries with pipes ('|') int_proteintarget Dumps as: Interactor_overlapping_protein <Protein> Target
  • Non-directional overlapping protein(s) - ?Protein (Free Text); separate multiple entries with pipes ('|') int_proteinnondir Dumps as: Interactor_overlapping_protein <Protein> Non_directional
  • Bait overlapping gene - ?Gene (Ontology) int_genebait Dumps as: Interactor_overlapping_gene <Gene> Bait
  • Target overlapping gene(s) - ?Gene (MultiOntology) int_genetarget Dumps as: Interactor_overlapping_gene <Gene> Target
  • Antibody - ?Antibody (MultiOntology) int_antibody Dumps as: Interactor_overlapping_gene <Mapped Gene> Antibody <Antibody> AND Antibody (on new line)
    • When mapping antibodies to genes, compare antibody-affiliated genes with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
    • For Antibodies that don't map to a gene in the interaction, Dump as: Unaffiliated_antibody <Antibody>
  • Antibody remark - Big Text int_antibodyremark Dumps as: Antibody_remark <Big_Text>

TAB3

  • Non-directional Gene(s) - ?Gene (MultiOntology) int_genenondir Dumps as: Interactor_overlapping_gene <Gene> Non_directional
  • Effector Gene(s) - ?Gene (MultiOntology) int_geneone Dumps as: Interactor_overlapping_gene <Gene> Effector
  • Affected Gene(s) - ?Gene (MultiOntology) int_genetwo Dumps as: Interactor_overlapping_gene <Gene> Affected
  • Non-directional Variation(s) - ?Variation (MultiOntology) int_variationnondir
    • Dumps as: Variation_interactor <Variation> Non_directional
    • Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
    • Dumps as: Interactor_overlapping_gene <Mapped Gene> Non_directional
    • Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
  • Effector Variation(s) - ?Variation (MultiOntology) int_variationone
    • Dumps as: Variation_interactor <Variation> Effector
    • Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
    • Dumps as: Interactor_overlapping_gene <Mapped Gene> Effector
    • Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
  • Affected Variation(s) - ?Variation (MultiOntology) int_variationtwo
    • Dumps as: Variation_interactor <Variation> Affected
    • Genes for this field need to be mapped to a gene at the dump stage; Genes that map to the variation will be dumped as the "Interactor_overlapping_gene" as follows:
    • Dumps as: Interactor_overlapping_gene <Mapped Gene> Affected
    • Variations that don't map to a gene will need to be assigned a gene at the ACEDB build stage; these objects will be indicated as such in the OA-dumper error output file
  • Non-directional Molecule(s) - ?Molecule (MultiOntology) int_moleculenondir Dumps as: Molecule_interactor <Molecule> Non_directional
  • Effector Molecule(s) - ?Molecule (MultiOntology) int_moleculeone Dumps as: Molecule_interactor <Molecule> Effector
  • Affected Molecule(s) - ?Molecule (MultiOntology) int_moleculetwo Dumps as: Molecule_interactor <Molecule> Affected

TAB4

  • Non-directional Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrnondir Dumps as: Rearrangement <Rearrangement> Non_directional
  • Effector Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrone Dumps as: Rearrangement <Rearrangement> Effector
  • Affected Rearrangement(s) - ?Rearrangement (MultiOntology) int_rearrtwo Dumps as: Rearrangement <Rearrangement> Affected
  • Effector Other Type - (Dropdown) int_otheronetype options are: Chemical or Transgene, int_otheronetype Dumps as (see next line)
  • Effector Other - ?Text int_otherone Dumps as: Remark "Effector <Effector Other Type>: <Text>"
  • Affected Other Type - (Dropdown) int_othertwotype options are: Chemical or Transgene, int_othertwotype Dumps as (see next line)
  • Affected Other - ?Text int_othertwo Dumps as: Remark "Affected <Affected Other Type>: <Text>"
  • RNAi - (MultiOntology) int_rnai Dumps as: Interaction_RNAi <RNAi>
  • Large scale RNAi - Free Text; separate multiple entries with pipes ('|') int_lsrnai (all large scale RNAi that doesn't match ontology) Dumps as: Interaction_RNAi <RNAi>
  • Interaction phenotype(s) - ?Phenotype (MultiOntology) int_phenotype Dumps as: Interaction_phenotype <Phenotype>
  • Expression pattern(s) - ?Expr_pattern (MultiOntology) int_exprpattern Dumps as: Interactor_overlapping_gene <Mapped Gene> Expr_pattern <Expr_pattern>
    • When mapping Expression patterns to genes, compare Expr-affiliated genes with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
    • For Expression patterns that don't map to a gene in the interaction, Dump as: Unaffiliated_expr_pattern <Expr_pattern>
  • Transgene(s) - ?Transgene (MultiOntology) int_transgene Dumps as: Interactor_overlapping_gene <Mapped Gene> Transgene <Transgene> AND Transgene (on new line)
    • When mapping transgenes to genes, compare transgene-affiliated genes (from the Driven_by_gene, Gene, and 3'UTR fields) with those in the Non-directional Gene(s), Effector Gene(s), Affected Gene(s), Bait Overlapping Gene and Target Overlapping Gene fields
    • For Transgenes that don't map to a gene in the interaction, Dump as: Unaffiliated_transgene <Transgene>

TAB5

  • Person - ?Person int_person Dumps as: Remark <Remark_text> Person_evidence <Person>
    • If there is no Remark entry, dumps as: Remark "See Person Evidence" Person_evidence <Person>
  • Confidence description - Text int_confidence Dumps as: Description <Text>
  • P-value - Text (Float) int_pvalue Dumps as: P_value FLOAT
  • Log-likelihood score - Text (Float) int_loglikelihood Dumps as: Log_likelihood_score FLOAT
  • High_throughput - (Toggle) int_throughput:
    • If ON, dumps as: High_throughput
    • If OFF (default), dumps as: Low_throughput
  • Sentence ID - (Ontology) sentence shows in term info; int_sentid Dumps as: N/A
  • False Positive - toggle, will not give an id or no dump if the sentence is false positive, containing no interaction info; int_falsepositive Dumps as: N/A

To go live on tazendra

To create new interaction tables on tazendra : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/create_datatype_tables.pl

Backup relevant tables : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/backupTable.pl

To transfer data from old interaction model tables to new interaction model tables and table formats : /home/postgres/work/pgpopulation/interaction/20120527_OA_newModel/transfer_int_data.pl


Instructions on How to Use the New OA

The new Interaction OA is intended to be used by curators for curating physical, predicted and genetic interactions. This is an overview of the key points to keep in mind while curating with the Interaction OA.

1) TAB 1 is for general information, TAB 2 is for physical interactions, TAB 3 & TAB 4 are for genetic and predicted interactions, and TAB5 is for detailed (and rarely used) information


TAB 1

2) Interaction IDs are generated automatically when the "New" button is clicked; new Postgres IDs (PGIDs) are generated automatically as well. If you would like to duplicate objects (because you are generating several similar interaction objects) and assign new IDs to them, select the interaction to duplicate and click the "Duplicate" button; this will generate a new PGID and OA row, but carry the same Interaction ID over from the duplicated interaction. If the new row should be a distinct interaction, delete the interaction ID from the Interaction ID field, and a cron job will assign an Interaction ID to that row the following evening. Note that if you leave any rows without an Interaction ID overnight, they will be each assigned a unique (and new) Interaction ID.


3) Entering Database information: Database information is typically provided with three pieces of information: the Database, the Database field name, and the Database Accession number for the interaction in question. These must be each entered surrounded by double quotes (") and separated with spaces. So for example:

"Database" "Database Field" "Accession Number"

If mulitiple database references are to be made, split on pipes ("|") like this:

"Database 1" "Database Field 1" "Accession Number 1"   |   "Database 2" "Database Field 2" "Accession Number 2"

In the (unlikely) event that any of the field entries have double quotes (") in the name itself, then the 'inner' double quotes will need to be escaped with a backslash ("\") like this:

"Database \"Supreme!!!!\""   "Database \"Field\""   "Accession \"Number\""

so that it will be read in ACEDB as:

Database "Supreme!!!!"     Database "Field"     Accession "Number"


4) The Person field is for Person evidence, when no other reference (such as a WormBase paper) is supplied as a reference. This will be dumped as a hash/supplement to the "Remark" entry, so it is advantageous to include any pertinent information there.


TAB 2

5) The "Library screens and Times found" field is for documenting screening/testing libraries that were used to identify a physical interaction. For example, a cDNA library may be used in a Yeast Two Hybrid screen to identify protein interaction partners with a "Bait" protein. Sometimes (but not always) authors might report the number of times a particular interaction was identified using a particular library. If not, enter the name of the library with double quotes (") like so:

"cDNA"

if multiple libraries (but no numbers), split on pipes like this:

"cDNA"    |    "ORFeome"

If a single libary, with a number (for number of times found):

"AD-TF mini-library" 5

and if multiple libraries, with numbers:

"AD-TF mini-library" 5   |   "AD-wrmcDNA library" 1

As with the Database field entries, if (for some reason) the name of the library has double quotes (") in the name itself, the double quotes will need to be escaped like this:

"AD-TF \"Mini\" library" 5

so that the library name eventually reads like this:

AD-TF "Mini" library


6) Sequence fields: "Sequence Bait", "Sequence Target(s)", and "Non-directional Sequence(s)"

To enter a single sequence object, type in the sequence object name, no quotes:

CK583862

For multiple sequence objects, split on pipes ("|"):

CK583862   |   CK583870


The same rules apply for Protein and CDS objects.


7) The Non-directional fields; For each type of interaction object, there is a "Non-directional" field, allowing a curator to enter all interactors of that type for a Non-directional type of interaction. Note that the "Non-directional Genes" field (which could apply to physical, genetic, and predicted interactions) lies in TAB 3.


8) For Directional interactions, there are "Bait" and "Target" fields for physical interactions (TAB 2) and "Effector" and "Affected" fields for genetic interactions (TAB 3).


TAB 3

9) Effector Variation(s), Affected Variation(s), and Non-directional Variation(s) fields are for variations that implicate an affiliated gene as an effector or affected interactor in a directional genetic interaction or non-directional interactors in a non-directional interaction. These fields can be populated without the need to populate the respective gene in the relevant gene field, as the dumping script (or build process) will make the appropriate associations.

TAB 4

10) The RNAi fields: There are two fields for references to RNAi objects: "RNAi" and "Large scale RNAi". The reason for two fields is that one field ("RNAi") is an ontology field reading off of RNAi experiments that live in the RNAi OA and Postgres. As RNAi experiments from papers containing 2,000 or more RNAi experiments (WBPaper00029258 for example) were excluded from the RNAi OA for performance reasons, any RNAi experiments from such papers will not be recognized by the "RNAi" ontology field, and therefore must be entered as free-text in the "Large scale RNAi" field.


11) The Transgene(s) field is for transgenes involved in the interaction, regardless of whether it is related to an "Effector" gene or "Affected" gene or the interaction is Non-directional. The dumping script will automatically associate the transgene with the appropriate interacting gene and dump in .ACE format accordingly.


12) Effector/Affected Other Type and Effector/Affected Other fields: these fields allow for the curation of Chemicals, Transgenes, or other entities that don't exists as proper WormBase/ACEDB objects, for example transgenes that only express human proteins. The "Other Type" fields allow for the selection of "Chemical" or "Transgene", the identity of which would go into the "Other" fields. This, ideally, will get phased out as chemicals are generated in the Molecule OA and transgenes in the Transgene OA, thereby allowing them to be entered into ontology-based "Transgene" or "Molecule" fields.

TAB 5

13) Confidence description: This field will capture free-text descriptions of the confidence the authors suggest they have for this interaction. In the Yeast Two Hybrid experiments, for example, the interaction may be described as "Interactome Core 1", "Interactome Core 2", or "Interactome Noncore" referring to the varying degrees of confidence for those interactions.


14) The P-value and Log-likelihood score fields are mostly to capture confidence values for predicted interactions that have been reported.


15) High_throughput toggle field is intended to capture whether or not the interaction was observed as one of several (50 - 1000s) interactions and thus should be interpreted with caution, or at least acknowledged as from a large scale experiment. The default is OFF and indicates that the experiment is low throughput.


16) The Sentence ID and False Positve fields are exclusively for Textpresso sentence-based curation

Nightly Cron Job to Assign New Interaction IDs

Every night at 4am a cron job script will run to assign new Interaction IDs to any row/PGID in the Interaction OA that does not already have an Interaction ID and that meets a few criteria. The script is located on Tazendra here:

/home/acedb/xiaodong/assigning_interaction_ids/assign_interaction_ids.pl

and the criteria for getting a new Interaction ID are as follows:

1) The curator of the interaction object/PGID is NOT Arun

2) The interaction object/PGID does NOT already have an Interaction ID

3) The interaction object/PGID is NOT flagged as False Positive

Any interactions/PGIDs that meet these three criteria will be assigned a new Interaction ID by the cron job.


Interaction OA .ACE Dumper

The script for the interaction OA dumper is located on Tazendra at:

/home/postgres/work/citace_upload/interaction/use_package.pl*


Error Checks During Dump Process

The following is a list of checks that the .ACE dumper script will perform on all interactions being dumped out of the OA to make sure that the data is consistent and doesn't have any nonsensical information:

Fatal Errors (Interactions will not get dumped)

1) If there are fewer than two interactors in an interaction, the dumper script will generate an error message that is printed to the ERROR output file and the object will not get dumped. This is determined by checking that:

a) There is at least one "Bait" entry and one "Target" entry OR

b) There is at least one "Effector" and one "Affected" entry OR

c) There is at least two "Non-directional" entries

If none of these conditions hold true, then an error message will be printed in tab-delimited format like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   There are not two interactors


2) If there is no reference (Paper or Person) then the object will not get dumped and an error message is printed:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   There is no reference, neither paper nor person


3) If there are incompatible interactor types, the interaction object will not get dumped. This means that the object will not get dumped if the following conditions are not met:

a) If there is a "Non-directional" entry, there are no "Effector", "Affected", "Bait", or "Target" entries AND

b) If there is an "Effector" entry, there is at least one "Affected" entry AND there are no "Non-directional", "Bait", or "Target" entries AND

c) If there is an "Affected" entry, there is at least one "Effector" entry AND there are no "Non-directional", "Bait", or "Target" entries AND

d) If there is a "Bait" entry, there is at least one "Target" entry AND there are no "Non-directional", "Effector", or "Affected" entries AND

e) If there is a "Target" entry, there is at least one "Bait" entry AND there are no "Non-directional", "Effector", or "Affected" entries

If these conditions are not met, the object will not get dumped and an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   has nondiretional + bait
12345   nodump    WBPerson1234   has nondiretional + target
12345   nodump    WBPerson1234   has nondiretional + effected
12345   nodump    WBPerson1234   has nondiretional + effector
12345   nodump    WBPerson1234   has effector but no effected
12345   nodump    WBPerson1234   has effector + bait
12345   nodump    WBPerson1234   has effector + target
12345   nodump    WBPerson1234   has effected + bait
12345   nodump    WBPerson1234   has effected + target
12345   nodump    WBPerson1234   has bait but no target


4) If there is no Interaction ID, the object will not get dumped and an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   There is no Interaction ID

The script will determine this by 1) generating a list of all PGIDs from the Interaction OA, 2) Removing all PGIDs where Arun is the curator, and then 3) looking for any PGIDs for which there is no Interaction ID. As there is a cronjob to add Interaction IDs to any PGIDs (Postgres rows) that are missing IDs, this problem should be rare (unless objects have been added that day (before the next cronjob) without Interaction IDs).


5) If there is no Interaction Type, the object will not get dumped and an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   There is no Interaction Type


6) If there is an Interaction object that exists on multiple Postgres lines/rows (i.e. the same Interaction ID with multiple PGIDs), the object will not get dumped and an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   nodump    WBPerson1234   WBInteraction000123456 exists across multiple lines

Non-Fatal Errors (Interactions will get dumped, but error message will get printed)

1) If there is a Variation, Expression pattern, Transgene, or Antibody that cannot be matched to a gene interactor, the object will be identified as "Unaffiliated" in the .ACE file and an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Interaction ID  <TAB>  Unaffiliated Object  <TAB>  Object Name

so, for example:

12345   lineonly    WBPerson1234   WBInteraction000123456   Unaffiliated_variation     WBVar00600763
12345   lineonly    WBPerson1234   WBInteraction000123456   Unaffiliated_transgene     kyEx456
12345   lineonly    WBPerson1234   WBInteraction000123456   Unaffiliated_antibody      [cgc2826]:hlh-2
12345   lineonly    WBPerson1234   WBInteraction000123456   Unaffiliated_expr_pattern  Expr1234


2) If there is an inconsistency between the directionality of the Interaction Type and the Interactors, the interaction object will get dumped to the .ACE file, but an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   flagonly   WBPerson1234   has diretional type Enhancement + nondirectional data
12345   flagonly   WBPerson1234   has diretional type Epistasis + nondirectional data
12345   flagonly   WBPerson1234   has diretional type Suppression + nondirectional data
12345   flagonly   WBPerson1234   has nondiretional type Mutual_enhancement + effected data
12345   flagonly   WBPerson1234   has nondiretional type Mutual_enhancement + effector data
12345   flagonly   WBPerson1234   has nondiretional type Mutual_suppression + effector data
12345   flagonly   WBPerson1234   has nondiretional type No_interaction + effected data
12345   flagonly   WBPerson1234   has nondiretional type Synthetic + effector data


3) If no curator is listed for the interaction, the interaction object will get dumped, but an error message will print to the ERROR output file like this:

PGID  <TAB>  Dump_status  <TAB>  Curator ID  <TAB>  Explanation

so, for example:

12345   flagonly   no curator   has no curator


Handling Dead Genes During Dump Process

The dumper script will now (as of May, 2013) run an automatic check for dead genes in any gene field. Any genes that are considered dead that are referenced in an Interaction object in the OA will be handled in the following manner:

1) If there is a replacement for the gene (i.e. the gene has merged into another gene), the dead gene will be dumped into a "Historical_gene" field in the .ACE file, the replacement gene will fill the original gene field. A comment will be added to the Historical_gene field via a 'Text' tag (updated as of 3-18-2015). The original gene field (now with the updated gene reference) will be printed with an "Inferred_automatically" tag after the gene. So, for example, if WBGene00001234 is now a dead gene that has been merged into WBGene00002345:

Gene  "WBGene00001234"

becomes

Gene  "WBGene00002345"  Inferred_automatically
Historical_gene  "WBGene00001234"  "Note: This object originally referred to WBGene00001234.
WBGene00001234 is now considered dead and has been merged into WBGene00002345. WBGene00002345 has 
replaced WBGene00001234 accordingly."

Also, since Antibodies, Transgenes, Expression patterns, Variations are mapped to an interactor where possible (or else they are dumped as "Unaffiliated"), this mapping will now occur to only the newest genes that the interactor refers to.

2) If there is no replacement for the gene (Dead or Suppressed), we would dump the following:

Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene
 (WBGene00001234) that is now considered dead. Please interpret with discretion."

OR

Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene
 (WBGene00001234) that has been suppressed. Please interpret with discretion."

and lastly,

3) If the gene has undergone a split, such genes will be dumped as:

Gene  "WBGene00001234"
Historical_gene  "WBGene00001234"  "Note: This object originally referred to a gene 
(WBGene00001234) that is now considered split. Please interpret with discretion."

and also printed out in the error output file of the dumping script for a curator to go back and manually change according to best judgement.


Gene Examples:
A split gene: WBGene00012507
A merged gene: WBGene00007524
A dead gene: WBGene00007814
A suppressed gene: WBGene00015490