Difference between revisions of "Lexicon Development Tool"

From WormBaseWiki
Jump to navigationJump to search
Line 1: Line 1:
 
<div class="Section1">
 
<div class="Section1">
  
<span style="mso-spacerun: yes">??? </span>Lexicon Development Tool (LDT) is built on top of existing <span class="SpellE">Textpresso</span> search functionalities.<span style="mso-spacerun: yes">? </span>It employed a three-tier architecture using CGI in Perl for the front end web interface, middle tier functional process logic and <span class="SpellE">Postgres</span> relational database for backend data storage.<span style="mso-spacerun: yes">? </span>Briefly, it allows a curator to login (see screenshot 1_CE_Login), setup a project (see screenshot 2_CE_Project) and submit <span class="SpellE">Textpresso</span> search (see screenshot 3_CE_Search), and the search results (sentences) are stored in the back end database.<span style="mso-spacerun: yes">? </span>These sentences can be retrieved by the curator and displayed on web page.<span style="mso-spacerun: yes">? </span>Curator can then select sentences of interest to make categories based on word frequency or save words/phrases based on the context in the sentence to include in appropriate categories or exclusion list (see screenshot 4_CE_SentenceAnnotation).<span style="mso-spacerun: yes">? </span><span style="mso-spacerun: yes">?</span>These newly made categories can be included in a new round of search and this process can thus go on iteratively.<span style="mso-spacerun: yes">?? </span>
+
<span style="mso-spacerun: yes"></span>Lexicon Development Tool (LDT) is built on top of existing <span class="SpellE">Textpresso</span> search functionalities. <span style="mso-spacerun: yes"></span>It employed a three-tier architecture using CGI in Perl for the front end web interface, middle tier functional process logic and <span class="SpellE">Postgres</span> relational database for backend data storage.<span style="mso-spacerun: yes"></span>Briefly, it allows a curator to login (see screenshot 1_CE_Login), setup a project (see screenshot 2_CE_Project) and submit <span class="SpellE">Textpresso</span> search (see screenshot 3_CE_Search), and the search results (sentences) are stored in the back end database.<span style="mso-spacerun: yes"></span>These sentences can be retrieved by the curator and displayed on web page.<span style="mso-spacerun: yes"></span>Curator can then select sentences of interest to make categories based on word frequency or save words/phrases based on the context in the sentence to include in appropriate categories or exclusion list (see screenshot 4_CE_SentenceAnnotation).<span style="mso-spacerun: yes"></span><span style="mso-spacerun: yes"></span>These newly made categories can be included in a new round of search and this process can thus go on iteratively.<span style="mso-spacerun: yes"></span>
  
<span style="mso-spacerun: yes">??? </span>This utility can be applied in both <span class="SpellE">datatype</span> identification and automatic extraction of text for <span class="SpellE">curation</span>.<span style="mso-spacerun: yes">? </span>In <span class="SpellE">datatype</span> identification, curator can upload a small training set containing 50 <span class="SpellE">positiveID</span> and <span class="SpellE">negativeID</span> of a <span class="SpellE">datatype</span> into LDT and optimize search query based on recall and precision measurement.<span style="mso-spacerun: yes">? </span>Once satisfying recall and precision values are obtained, the batch mode version of LDT (e.g. the search module) can be run in the automatic pipeline using the search query optimized on new <span class="SpellE">paperIDs</span> and those IDs identified as positive for the <span class="SpellE">datatype</span> can be deposited into the tracking database.<span style="mso-spacerun: yes">? </span><span style="mso-spacerun: yes">?</span>
+
<span style="mso-spacerun: yes"></span>This utility can be applied in both <span class="SpellE">datatype</span> identification and automatic extraction of text for <span class="SpellE">curation</span>.<span style="mso-spacerun: yes"></span>In <span class="SpellE">datatype</span> identification, curator can upload a small training set containing 50 <span class="SpellE">positiveID</span> and <span class="SpellE">negativeID</span> of a <span class="SpellE">datatype</span> into LDT and optimize search query based on recall and precision measurement.<span style="mso-spacerun: yes"></span>Once satisfying recall and precision values are obtained, the batch mode version of LDT (e.g. the search module) can be run in the automatic pipeline using the search query optimized on new <span class="SpellE">paperIDs</span> and those IDs identified as positive for the <span class="SpellE">datatype</span> can be deposited into the tracking database.<span style="mso-spacerun: yes"></span><span style="mso-spacerun: yes"></span>
  
<span style="mso-spacerun: yes">??? </span>In automatic extraction of text for <span class="SpellE">curation</span>, curator can upload a single article of a <span class="SpellE">datatype</span> and conduct the same iterative process of Search?CMaking Category-Search until the search query is so specific that only those sentences containing relevant <span class="SpellE">curation</span> information are returned.<span style="mso-spacerun: yes">? </span>The same search query can then be applied to all the other papers of the <span class="SpellE">datatype</span> need to be <span class="SpellE">curated</span>.<span style="mso-spacerun: yes">?? </span>These selected sentences can then be exported into a <span class="SpellE">textfile</span> in the user??s folder (see screenshot 5_CE_Results) in the format which can then be directly imported into a <span class="SpellE">curation</span> tool such as <span class="SpellE">Phenote</span> for further <span class="SpellE">curation</span>.<span style="mso-spacerun: yes">? </span>
+
<span style="mso-spacerun: yes"></span>In automatic extraction of text for <span class="SpellE">curation</span>, curator can upload a single article of a <span class="SpellE">datatype</span> and conduct the same iterative process of Search-Making Category-Search until the search query is so specific that only those sentences containing relevant <span class="SpellE">curation</span> information are returned.<span style="mso-spacerun: yes"></span>The same search query can then be applied to all the other papers of the <span class="SpellE">datatype</span> need to be <span class="SpellE">curated</span>.<span style="mso-spacerun: yes"></span>These selected sentences can then be exported into a <span class="SpellE">textfile</span> in the user's folder (see screenshot 5_CE_Results) in the format which can then be directly imported into a <span class="SpellE">curation</span> tool such as <span class="SpellE">Phenote</span> for further <span class="SpellE">curation</span>.<span style="mso-spacerun: yes"></span>
  
 
</div>
 
</div>

Revision as of 08:51, 19 May 2009

Lexicon Development Tool (LDT) is built on top of existing Textpresso search functionalities. It employed a three-tier architecture using CGI in Perl for the front end web interface, middle tier functional process logic and Postgres relational database for backend data storage.Briefly, it allows a curator to login (see screenshot 1_CE_Login), setup a project (see screenshot 2_CE_Project) and submit Textpresso search (see screenshot 3_CE_Search), and the search results (sentences) are stored in the back end database.These sentences can be retrieved by the curator and displayed on web page.Curator can then select sentences of interest to make categories based on word frequency or save words/phrases based on the context in the sentence to include in appropriate categories or exclusion list (see screenshot 4_CE_SentenceAnnotation).These newly made categories can be included in a new round of search and this process can thus go on iteratively.

This utility can be applied in both datatype identification and automatic extraction of text for curation.In datatype identification, curator can upload a small training set containing 50 positiveID and negativeID of a datatype into LDT and optimize search query based on recall and precision measurement.Once satisfying recall and precision values are obtained, the batch mode version of LDT (e.g. the search module) can be run in the automatic pipeline using the search query optimized on new paperIDs and those IDs identified as positive for the datatype can be deposited into the tracking database.

In automatic extraction of text for curation, curator can upload a single article of a datatype and conduct the same iterative process of Search-Making Category-Search until the search query is so specific that only those sentences containing relevant curation information are returned.The same search query can then be applied to all the other papers of the datatype need to be curated.These selected sentences can then be exported into a textfile in the user's folder (see screenshot 5_CE_Results) in the format which can then be directly imported into a curation tool such as Phenote for further curation.