WormBase-Caltech Weekly Calls
From WormBaseWiki
Revision as of 19:18, 27 September 2018 by Cgrove (talk | contribs) (→Genetics and G3 papers in Textpresso)
Contents
Previous Years
GoToMeeting link: https://www.gotomeet.me/wormbase1
2018 Meetings
September 6, 2018
Genotype class
- Chris started initial document to draw up ?Genotype class and make appropriate changes to ?Strain class
- Would be good for people to look at so we can discuss next time
- Would also be good to have Kevin H take a look and provide feedback
Citace Upload
- Send citace files to Wen by Sept 18, 10am Pacific
Automated gene descriptions
- Group making improvements
- Added disease, protein domains
- When direct experimental evidence for disease relevance, will say "gene has been used to study"
- When minimal data (information-poor genes), we can refer to human ortholog and data stored for that human gene in Alliance
- Continue to receive feedback from users; include enrichment information, etc.
- Now have good trimming algorithms to retain important info without flooding a description with too many granular terms
- Will not store automated descriptions in Postgres
- Wen will modify SimpleMine scripts to accommodate change
- Would be good to write a paper on automated concise descriptions
- Came up at GO meeting/hackathon: Translating GO-CAM models into concise descriptions?
- Should be doable; just need to develop code when we're ready to do that
Textpresso presentation at next Alliance all-hands call
- Who should present? Maybe have several people? Kimberly, Valerio, Michael for sections?
- We can discuss at next Textpresso meeting
- Should cover techniques and software, but keep it generally simple and comprehensible for a larger audience
September 13, 2018
Citace upload
- Send files to Wen by 10am Tuesday (18th)
Genotype class
- ?Genotype class proposal
- Genotype_name tag: free text summary of the genotype
- Can this be automatically generated from components? Ideally, yes, but may be difficult
- Otherwise, can be manually written, as we have been doing, but is a bit denormalized and may require maintenance
- Genotype_description tag: free-text description of the genotype
- No precedent and probably not going to start now, so will remove from model
- Genotype_components supertag: to collect genotype component objects and, where necessary, free text
- We want to be able to express zygosity for each referenced object, likely requires a #Zygosity hash (and hence a ?Zygosity model)
- ?Zygosity model can have three main tags: "Homozygous", "Heterozygous_with_wild_type", or "Heteroallelic_combination_with"
- Heteroallelic_combination_with tag could further specify the type and identity of the object that is in heteroallelic combination with the original object
- Since it is not ideal to store an arbitrary component in the #Zygosity hash, we should probably just state the zygosity as "Heteroallelic_combination" and for display purposes have an automated way to calculate which components affect the same locus/loci (if necessary)
September 20, 2018
Kimberly's talk at Rutgers
- Kimberly went to worm meeting at Rutgers (10 labs using C. elegans? 6-7 totally worm-centric)
- 30-40 attendees (PIs, postdocs, grad students)
- Discussed tools and features at WB
- Presented Alliance pages and Textpresso
- PIs are enthusiastic about WB
- Monica Driscoll made a good plug for Textpresso
- People requesting FAQs and user guides (text and videos)
- Monica suggested a WB tutorial for PIs ;p
- Some people surprised about what they can accomplish using the tools available, like SimpleMine
- Covered gene set enrichment, WormMine, SPELL, SimpleMine, ParaSite BioMart, Textpresso
- Would be good to show people how to use Textpresso Central
- Discussed micropublications, asked about negative results (precedent?)
- Some asked about WB funding and Alliance plans
- Can we make a within-page search available to find, for example, field names etc.
- Some challenges in find genes/proteins of certain class
- Had question about histone genes recently
- Repeatedly have had questions about finding "ion channels"
- Searching gene class with text pulls out lots of false positives
- Could perform an analysis on particular classes of genes (e.g. histones or ion channels) and generate a micropublication providing the curated list
- Can generate a WormMine template query to pull these out for each release
- What classes of genes would we want to identify: histones, transcription factors, ion channels, protein kinases
- Chris will look into WormMine templates using gene class info and look into pulling in protein motif information
- We will ask other MODs and UniProt about how they deal with this issue
September 27, 2018
Update on the new AFP form and pipeline
- Daniela, KImberly, Juancarlos, and Valerio will update on the current status of the new AFP form and pipeline
- Overall, the goal has been to incorporate as much Textpresso-based entity and data-type flagging as possible into the form
- Move from author data flagging to author data validation wherever we can
- Provide opportunities for authors to submit more detailed curation if they want
- General: Positive thru SVM gets checked checkbox
- General: Question mark icons with help text
- Gene recognition
- Need to set a threshold of mentions; don't necessarily want all genes mentioned once
- Can we show all genes, ranked by occurrence?
- Don't want to overwhelm users
- How are the genes identified? Via the Textpresso pipeline, string matching, consolidate multiple instances (protein, gene, etc.) into single gene result
- Searches include supplemental materials
- Cannot search by section of paper
- Can we identify genes other than C. elegans/worm? Are not doing now, and will stick to C. elegans for now
- Will expand to non-elegans nematodes in future; will expand to other species when extending to other MODs/Alliance members
- Chris: should we show the name of the gene as mentioned, verbatim, from the paper?
- Karen: No, we should insist authors use the proper names
- Chris: Meant referencing sequence names in paper, but public name comes out by the time AFP goes to authors, causing confusion
- Can we pull genes from tables? We are pulling from PDF tables, but not supplemental Excel tables, for example
- Need to set a threshold of mentions; don't necessarily want all genes mentioned once
- Gene model updates: checkbox yes/no
- Species in paper
- Including worm, mouse, human, yeast
- Still more work to do on this front
- Alleles recognized
- Show list of allele names and WBVar IDs for confirmation
- Can submit new alleles within the AFP form (just allele names, no genes or other info; keeping it simple)
- Allele sequence change checkbox yes/no (link to Allele sequence info form)
- Can there be a feedback option readily available? There is a comments section toward the end of the form under "Anything else?"
- Transgenes handled like alleles
- Antibodies
- Newly generated antibodies checkbox and text field (ask for details? consistency with alleles?) maybe shouldn't ask for antibody details; can make details optional
- Form for existing antibodies
- Expression data
- Anatomic expression in WT
- Site of action (may be difficult to interpret user input; ask for example; make text details required)
- Time of action
- RNAseq data
- Microarrays - just link out to GEO
- Interactions (all SVM based, three checkboxes)
- Phenotypes (SVMs, link to phenotype form)
- Disease
- Checkbox for worm orthologs of human disease gene, etc.
- Comments section (to point out missing data types, provide general comments on form)
- Ask for unpublished data and suggest micropublication
- Final thank you and update contact info and lineage
- CIT feedback
- Maybe make font size larger
- Mobile device compatible? Yes
- Change "Anything else?" to "Anything else? Comments?"
- Can people save and return later? Yes
- How do we know they're finished? There is a "Finish and submit" button at end (but authors can still go back and make changes later)
- Maybe move "Finish and submit" button to left panel so it is always visible? Maybe make the button stand alone?
- If authors indicate there are physical interactions, can we distinguish elegans-elegans interactions vs. non-elegans or interspecies interactions? No, we cannot yet distinguish
Genetics and G3 papers in Textpresso
- These papers don't get a PMID yet (when they first enter WB), only DOI (most of time DOI doesn't work (yet))
- DOI should work right away; Karen will look into if there's a problem/typo
- Daniel needs to keep track, go back and merge WBPapers once PMID goes live
- Kimberly or Karen may have to send papers directly to Daniel for uploading
- Should Daniel only download papers with a PubMed ID? Yes, except for micropublications?
- Need a separate pipeline for micropublications? Daniel is currently downloading the papers
ParaSite (non-elegans) papers
- Should Daniel be trying to download all of these papers? Many are hard to track down
- Daniel should ask Michael Paulini