Sentence Saver for Category Seed
From WormBaseWikiJump to navigationJump to search
- The sentence saver for seeding categories was intended to be used as a starting point for seeding new categories that could then be tested in Textpresso searches.
- The input is a list of paper IDs for known positive papers for a given data type.
- The initial output is a list of words sorted according to their frequency and distribution in a subset of sentences within the input papers.
- The final output is one or more categories of words that can be uploaded to a Textpresso implementation for testing.
- The sentence saver is used when curators already know papers from which they want to save sentences.
- For example, creating new categories for curation of GO Molecular Function terms related to enzymatic activity by selecting sentences from papers already curated.
- Alternatively, saving sentences from papers initially being read for another purpose, such as concise descriptions or reference genome curation.
Sample Work Scenarios
Brand New Categories
- Curator logs in.
- Enters a list of 30 WBPaper IDs containing sentences positive for a given data type previously uncurated.
- Clicks on 'Load Sentence File and Query Papers !
- Selects ~100 sentences from the positive papers.
- Saves sentences, giving sentence file a name.
- Examines frequency/distribution of terms.
- Selects terms for one or more categories.
- Names categories and saves them (somehow connecting the new categories to the saved sentence file).
Adding more sentences to an existing file
- Curator wants to add more sentences to an existing file to improve frequency analysis.
- Logs in.
- Enters a list of new papers from which to save sentences to an already existing file.
- Users need to be able to continually add sentences to an existing file, as it is quite likely that all sentences will not be selected in one sitting. Maybe a load and save sentences function?
- Users need to be able to see the sentence file prior to uploading a list of papers. This will help to keep track of the list of papers from which sentences have been selected.
- User will need to be able to edit the sentences file, if needed, deleting individual sentences or whole papers, if necessary. Maybe a view and edit sentences function?
Interface with First Pass Curation Form
- For creating molecular function categories, ~80% of the papers had not been flagged, mostly because the preceded the first-pass start date.
- But, having identified these papers as containing a data type flagged on the first pass form, it'd be good to add a simple Yes to the curator column for the in vitro analysis data type.