Detailed Documentation of Form and Scripts

Currently, on mangolassi the ccc.cgi and other scripts and files are here:

azurebrd/public_html/cgi-bin/forms/ccc

accession
- This is a file that maps WB WBPaper IDs to PMIDs.
- For WB, this file is generated each time the search is performed.
ccc_celegans_2013only
- I believe these are old test files that can be deleted.
ccc.cgi
- This is the code for the curation form.
- ccc.cgi Documentation
ccc.js
- This is the ccc form javascript code.
- If we want to change the number of characters needed to begin autocomplete, we can do that here.
c_elegans.WS234.xrefs.txt and c_elegans.WS236.xrefs.txt
- These are files generated with each WB build that were used to create the WB gpi file.
generate_gpi.pl
- This is the script that will be used to manually generate a new gpi file for WB.
jquery
- This directory contains...
notes
- This file contains a short bit about mapping WBGene IDs to UniProtKB accessions for the gpi file.
scripts
- This directory contains:
  - accession file - This is a mapping file that contains the mappings between WBPaper IDs and PMIDs as well as TAIR doc IDs and PMIDs. Note that we need PMIDs to send annotations to Protein2GO.
  - create_ccc_pgcuration.pl - This perl script creates two postgres tables for a given Textpresso source file:

creates these two tables:

ccc_sentenceclassification
  ccc_mod text,
  ccc_file text,
  ccc_paper text, 
  ccc_section text,
  ccc_sentnum text, 
  ccc_sentenceclassification text,
  ccc_comment text,
  ccc_curator text, 
  ccc_timestamp text,

ccc_sentenceannotation
  ccc_mod text,
  ccc_file text,
  ccc_paper text,
  ccc_section text,
  ccc_sentnum text,
  ccc_geneproduct text,
  ccc_component text,
  ccc_goterm text,
  ccc_evidencecode text,
  ccc_with text,
  ccc_alreadycurated text,
  ccc_comment text,
  ccc_valid text,
  ccc_ptgoid text,
  ccc_curator text,
  ccc_timestamp

create_ccc_pgindices.pl - This script creates the following two tables for each Textpresso source file:
- ccc_geneprodindex - this table contains the list of gene products mentioned in the sentence mapped to a MOD ID and a UniProtKB ID.
  - ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
- ccc_componentindex - this table lists, for each sentence, the cellular components that matched the Textpresso cellular component category
  - ccc_mod ccc_file ccc_paper ccc_section ccc_sentnum ccc_'table' ccc_timestamp
gpi files - these files contain gene names, synonyms, and MOD and UniProtKB identifiers. For file format specifications, see:

GO's gpi file format specification

- dictyBase_07032013.gpi
- TAIR1_gpi
- worm_gpi
meh - this looks like a test file for ccc_geneprodindex for TAIR.
old_tables - this is a file that lists the names of the tables used for the previous version of the CCC curation forms.
out - this looks like another test file for ccc_geneprodindex for TAIR.
populate_ccc_pg_indices.pg.tair1 and populate_ccc_pg_indices.pg.worm1 - these look like the output files for the populate_ccc_pg_indices.pl for TAIR and WB. Note that for TAIR there are some sentences that were not processed properly.

Back to WormBase

Detailed Documentation of Form and Scripts

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools