WormBase
Return to Caltech documentation
Contents
Search frequency
- Monthly, on the 1st (?) of the month
- Search previous month's papers, i.e. in August, search all papers where the date begins with 2013-07
Search according to SVM classification
- Search all predicted positives (low, medium, high) of Other_expr pattern SVM
Search Filters
- Remove all sentences with a Textpresso sentence score of 30 or higher.
- Other filtering steps may be introduced in the future (e.g., specific proteins like DAF-16 or sentences that also contain words like mutant or RNAi).
Sample File Format - File Names
20130801_WB_ccc
Date Script was Run_MOD_Type of Textpresso Search
Sample File Format - Sentence Format
SSC:7 PMID:23263989:references:276 ZFP-1 chromosomes 771 FIG 4 <protein_celegans>ZFP-1</protein_celegans> : : <localization_experimental_082208>GFP</localization_experimental_082208> <localization_verbs_082208>localizes</localization_verbs_082208> to <localization_cell_components_2011-02-11>chromosomes</localization_cell_components_2011-02-11> in maturing oocytes and is <localization_experimental_082208>widely</localization_experimental_082208> <localization_verbs_082208>expressed</localization_verbs_082208> 772 at all developmental <localization_experimental_082208>stages</localization_experimental_082208> .
Sentence File Location
- On textpresso-dev: (The test sentences were here" http://textpresso-dev.caltech.edu/ccc_results/celegans/ccc_celegans_2013only)
- On tazendra: /home/azurebrd/public_html/cgi_bin/forms/ccc/source/worm
Archived Info - Old script explanation - Juancarlos
The script runs on Wednesdays at 2am and looks for a new SVM file. If a new SVM file is available, the Textpresso search is performed; if not, the search is skipped for that week.
The script that gets stuff to tazendra from textpresso is :
/home/postgres/public_html/cgi-bin/data/ccc_gocuration/get_newset.pl
called by :
/home/postgres/work/pgpopulation/textpresso/wrapper.sh
On textpresso there have been no new matches since Oct 12
The full results are on textpresso-dev at :
/data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/good_sentences_file.*
The new matches are at :
/data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/recent_sentences_file.*
They're comparing to svm results from :
http://caprica.caltech.edu/celegans/svm_results/Juancarlos/otherexpr
You can check the recent good_sentences_file.* and see if any of those should be in SVM, or you can look at the full text of the SVM results and see if any of those should be in the good_sentences_file.* If you don't have a textpresso-dev account, you can ask Michael, and he'll ask the its people. I log on with my its account.
If stuff should be in the good_sentences_file.* and isn't, check that the categories have what they should have, and let me know which category isn't matching what it should.
The good_sentences_file.* is generated by :
/home/azurebrd/work/get_kimberly_go_gene_component_verb_localization/get_go_gene_component.pl
Now located here?
/home/postgres/work/textpresso/kimberly/get_go_gene_component.pl
The category files being used are in :
/data2/data-processing/data/celegans/Data/indices/body/semantic/categories
files :
protein_celegans localization_cell_components_082208 localization_verbs_082208 localization_other_120107
Archived Info - Old Curation Form
Relevant postgres tables:
ccc_gene_comp_go
Back to Gene Ontology