Detailed Documentation of Form and Scripts

From WormBaseWiki
Revision as of 20:55, 16 August 2013 by Vanaukenk (talk | contribs)
Jump to navigationJump to search

Currently, on mangolassi the ccc.cgi and other scripts and files are here:

azurebrd/public_html/cgi-bin/forms/ccc

  • accession
    • This is a file that maps WB WBPaper IDs to PMIDs.
    • For WB, this file is generated each time the search is performed.
  • ccc_celegans_2013only
    • I believe these are old test files that can be deleted.
  • ccc.cgi
  • ccc.js
    • This is the ccc form javascript code.
    • If we want to change the number of characters needed to begin autocomplete, we can do that here.
  • c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
    • These are files generated with each WB build that were used to create the WB gpi file.
  • generate_gpi.pl
    • This is the script that will be used to manually generate a new gpi file for WB.
  • jquery
    • This directory contains...
  • notes
    • This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
  • scripts
    • This directory contains:
      • accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO.
      • create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:

creates these two tables:

ccc_sentenceclassification
  ccc_mod text,
  ccc_file text,
  ccc_paper text, 
  ccc_section text,
  ccc_sentnum text, 
  ccc_sentenceclassification text,
  ccc_comment text,
  ccc_curator text, 
  ccc_timestamp text,
ccc_sentenceannotation
  ccc_mod text,
  ccc_file text,
  ccc_paper text,
  ccc_section text,
  ccc_sentnum text,
  ccc_geneproduct text,
  ccc_component text,
  ccc_goterm text,
  ccc_evidencecode text,
  ccc_with text,
  ccc_alreadycurated text,
  ccc_comment text,
  ccc_valid text,
  ccc_ptgoid text,
  ccc_curator text,
  ccc_timestamp
  • create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
    • ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
      • ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
    • ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
      • ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
  • gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see:
GO's gpi file format specification
    • dictyBase_07032013.gpi
    • TAIR1_gpi
    • worm_gpi
  • meh - this looks like a test file for ccc_geneprodindex for TAIR.
  • old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
  • out - this looks like another test file for ccc_geneprodindex for TAIR.
  • populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly.




Back to WormBase