Difference between revisions of "WormBase"

From WormBaseWiki
Jump to navigationJump to search
Line 10: Line 10:
 
*Search all predicted positives (low, medium, high) of Other_expr pattern SVM  
 
*Search all predicted positives (low, medium, high) of Other_expr pattern SVM  
  
 
== Searches by section ==
 
 
The results of the test searches, per section, were as below.  The number corresponds to the number of papers returned.
 
 
abstract 0
 
 
acknowledgments 1
 
 
author 0
 
 
conclusion 0
 
 
discussion 38
 
 
introduction 27
 
 
materials 12
 
 
non-sectioned 36
 
 
references 19
 
 
results 181
 
 
title 0
 
 
year 0
 
 
From this, I would say we should perform the searches using the five categories and looking in discussion, introduction, materials, non-sectioned, references, and results.
 
  
 
== Script explanation - Juancarlos ==
 
== Script explanation - Juancarlos ==

Revision as of 15:41, 5 August 2013

Return to Caltech documentation

Search frequency

  • Monthly, on the 1st (?) of the month
  • Search previous month's papers, i.e. in August, search all papers where the date begins with 2013-07

Search according to SVM classification

  • Search all predicted positives (low, medium, high) of Other_expr pattern SVM


Script explanation - Juancarlos

The script runs on Wednesdays at 2am and looks for a new SVM file. If a new SVM file is available, the Textpresso search is performed; if not, the search is skipped for that week.

The script that gets stuff to tazendra from textpresso is :

 /home/postgres/public_html/cgi-bin/data/ccc_gocuration/get_newset.pl

called by :

 /home/postgres/work/pgpopulation/textpresso/wrapper.sh             

On textpresso there have been no new matches since Oct 12

The full results are on textpresso-dev at :

 /data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/good_sentences_file.*

The new matches are at :

 /data2/srv/textpresso-dev.caltech.edu/www/docroot/azurebrd/ccc_datafiles/recent_sentences_file.*

They're comparing to svm results from :

 http://caprica.caltech.edu/celegans/svm_results/Juancarlos/otherexpr

You can check the recent good_sentences_file.* and see if any of those should be in SVM, or you can look at the full text of the SVM results and see if any of those should be in the good_sentences_file.* If you don't have a textpresso-dev account, you can ask Michael, and he'll ask the its people. I log on with my its account.

If stuff should be in the good_sentences_file.* and isn't, check that the categories have what they should have, and let me know which category isn't matching what it should.


The good_sentences_file.* is generated by :

 /home/azurebrd/work/get_kimberly_go_gene_component_verb_localization/get_go_gene_component.pl

Now located here?

/home/postgres/work/textpresso/kimberly/get_go_gene_component.pl


The category files being used are in :

 /data2/data-processing/data/celegans/Data/indices/body/semantic/categories

files :

 protein_celegans 
 localization_cell_components_082208 
 localization_verbs_082208 
 localization_other_120107

Curation Form

Relevant postgres tables:

ccc_gene_comp_go


Back to Gene Ontology