WormBase-Caltech Weekly Calls May 2018

May 3, 2018

SimpleMine output

Wen was working on simplification of SimpleMine output
Considering removing general terms in an ontology when more specific terms exist
Concern that we would be removing information; will keep terms

SPELL topics

Topics in SPELL need some organization, possibly trimming
We could create a graph (SObA) display of topics (based on GO process)

SPELL problem with WS265

Wen had to debug; SPELL has limit of how many genes can be processed per data set (46,340)
Wen trying to accommodate, deleting some genes from data set that had no expression (kludge)
Wen will write to Matt Hibbs to ask how to deal with
Will Alliance work together on a system to analyze large scale expression data?

Curator candidate

Will arrive at 10am
Skype calls with remote curators
Curators will send group Skype handle and requested time to talk

May 10, 2018

ECO terms for genome editing

Asking group's feedback on ECO terms for genome editing (Daniela)
What would be used for fly-enhancer trap experiment?
Genomically encoded GFP, for example
ECO: GFP localization
ECO term, example: Fluorescent protein transcript localization evidence
Single copy transgene? Endogenous locus?
Whether it is CRISPR or not may not be relevant
May request ECO terms that capture distinct types of transgenes evidence
Will use generic term for now
Do other MODs use ECO?
May want to capture endogenous/non-endogenous, multi-copy/single-copy distinctions
Many of these features are captured in the transgene and construct objects already; specific ECO code redundant?

ZFIN SAB

Significant involvement in Alliance
Interest in micropublications; will push a pilot
June 19, moving from older DB to Postgres
Investigating automation for some curation processes
Students review which papers to include; acquire PDF
Curate paper-by-paper

Alliance

Supplement request for year 3, due soon (May 15)
Formal report from NHGRI, 18-month plan looks good
Further future plans (from NHGRI perspective) aren't quite clear
Software infrastructure?
Central- vs. MOD-control of resources questions
Likely will have to write a NIH proposal in Fall or Winter
How much is Alliance going to handle human variants?
NHGRI interested in metabolomics; Alliance plans?

Genome-wide screens

Had help desk question about phenotypic screens in organisms other than worms, flies, yeast, bacteria
There have been human cell line phenotypic screens (e.g. siRNA/shRNA); who curates these, if anyone?
Also, induced pluripotent stem cell experiments

Nameserver issues

Issue came up about assigning unique WBStrain IDs
Can use a nightly nameserver dump from Hinxton to populate Postgres/OA
Will need to clean up existing strains in Postgres
Also, considering unique IDs for genotypes
Mechanics of naming and managing naming of objects
- Nightly syncing (cronjob) to nameserver
- Ideally, we would have instant updates; Hinxton firewall prevents direct access; Matt working on establishing a separate nameserver location to gain direct access
- Strain names (at least historically) have been updated quarterly from CGC file
- Curators need mechanism to create and use strain (and variation) names right away
- Current system requires manual denormalization step; has worked so far

May 17, 2018

Strains

Create a strain OA? Central curation tool for strain data?
Would need to maintain synchrony with CGC and Hinxton
Postgres/Tazendra variation adding CGI: http://tazendra.caltech.edu/~azurebrd/cgi-bin/forms/generic.cgi?action=TempVariationObo
Will add similar link for Strains, adding info to obo_name_strain and obo_data_strain as well as a tempfile, which will those objects in postgres when the nightly_geneace.pl updates the OA strain info

SAB 2018

Who will present? Present what?
We can generate a central document with stats to give to SAB
Ask SAB for opinions and guidance?
Would be good to assess current efforts and priorities, ask if we should stay the course or make modifications to our approach

May 24th, 2018

Helpdesk

Not sure if this falls under content....
User asked about phenotype information being incorporated into the gene description
Chris directed him to phenotype submission form; he has since submitted the phenotypes
Phenotype data hasn't yet been incorporated into the automated descriptions pipeline
We can direct him to the gene description submission form
Ranjana will respond to user email
Will likely swap out older WB automated description pipeline with newer Alliance automated description pipeline

Relations Ontology

Working on new model for importing RO terms into WormBase
Question: Do we need to import all of RO?
- For example, RO also has terms from other ontologies:

    [Term]
    id: GO:0003674
    name: molecular_function
    is_a: BFO:0000015 ! process
    property_value: IAO:0000589 "molecular process" xsd:string

Question: How much term information do we want and/or need?
- For example:

    [Typedef]
    id: BFO:0000050
    name: part of
    def: "a core relation that holds between a part and its whole" []
    property_value: IAO:0000111 "is part of" xsd:string
    property_value: IAO:0000112 "my brain is part of my body (continuant parthood, two material entities)" xsd:string
    property_value: IAO:0000112 "my stomach cavity is part of my stomach (continuant parthood, immaterial entity is part of material entity)" xsd:string
    property_value: IAO:0000112 "this day is part of this year (occurrent parthood)" xsd:string
    property_value: IAO:0000116 "Everything is part of itself. Any part of any part of a thing is itself part of that thing. Two distinct things cannot be part of each other." xsd:string
    property_value: IAO:0000116 "Occurrents are not subject to change and so parthood between occurrents holds for all the times that the part exists. Many continuants are subject to change, so parthood between continuants will only hold at 
    certain times, but this is difficult to specify in OWL. See https://code.google.com/p/obo-relations/wiki/ROAndTime" xsd:string
    property_value: IAO:0000116 "Parthood requires the part and the whole to have compatible classes: only an occurrent can be part of an occurrent; only a process can be part of a process; only a continuant can be part of a continuant; only an 
    independent continuant can be part of an independent continuant; only an immaterial entity can be part of an immaterial entity; only a specifically dependent continuant can be part of a specifically dependent continuant; only a generically 
    dependent continuant can be part of a generically dependent continuant. (This list is not exhaustive.)\n\nA continuant cannot be part of an occurrent: use 'participates in'. An occurrent cannot be part of a continuant: use 'has participant'. A 
    material entity cannot be part of an immaterial entity: use 'has location'. A specifically dependent continuant cannot be part of an independent continuant: use 'inheres in'. An independent continuant cannot be part of a specifically dependent 
    continuant: use 'bearer of'." xsd:string
    property_value: IAO:0000118 "part_of" xsd:string
    property_value: RO:0001900 RO:0001901
    property_value: seeAlso http://ontologydesignpatterns.org/wiki/Community:Parts_and_Collections
    property_value: seeAlso http://ontologydesignpatterns.org/wiki/Submissions:PartOf
    property_value: seeAlso http://www.obofoundry.org/ro/#OBO_REL:part_of xsd:string
    is_transitive: true
    is_a: RO:0002131 ! overlaps
    inverse_of: BFO:0000051 ! has part

For curation, we could import only the BFO and RO ID spaces of RO, but include all of the tag-value pairs (the usage examples might be helpful)
For WB, though, we could injest only the BFO and RO ID spaces of RO, and only include in the model: id, name, def, is_a, domain, range, and inverse_of tags
- We can always link out from WB to pages with more detail on RO terms

Will use RO ids in ?GO_annotation model in the Annotation_relation part (model will need an update)
- The ?GO_annotation model also refers to relations used in annotation extensions. Unfortunately, though, not all annotation extension relations are in RO, so we can't yet use RO in this part of the model.
- We can either import these other GO relations as a separate class, or import them if/when they get included in RO (there is a PRO/RO meeting scheduled for late October with some preliminary phone conferences prior).
- The Alliance Gene Expression group is also dealing with this issue.
Where can we use RO terms in other curation models?
Kimberly will add to the agenda for the next WB site-wide conference call

Methods in Molecular Biology book

Eukaryotic Genome Databases, has WormBase chapter
Book arrived at Caltech
Chris will ask publishers about getting PDFs without watermarks

ICBO 2018 meeting

International Conference on Biological Ontology (2018)
Raymond considering submitting abstract
Not clear if it needs to be a full paper; can we resubmit Biocuration meeting abstract?
Can the content be published elsewhere once submitted to the meeting?

May 31, 2018

Feedbacks from Front Range Worm Meeting

Is it possible to collect old theses online and load them into Textpresso?
- Yes, possible if we can get PDF; collection may be rate limiting?
- Ask people to submit theses to Textpresso, could be PDF, Word, LaTeX?
- Caltech library has all PhD theses from Caltech; may be able to ingest in bulk; maybe difficult for older (pre-1980) theses
- Would it be worth the effort? How much info is published? Micropublications?
- Would not be curated, but sit in different literature category
Shall we suggest authors put "elegans" in titles and abstracts? Min Han said some of his papers do not have this keyword.
- It's OK, we will still get those papers indexed by PubMed for MeSH terms
Community curation. Erin Osborne Nishimura mentioned camps and courses for undergraduate research. Can they do allele and phenotype curation for WormBase? Who will follow up with her?
- Important to consider the implications of student curation
- Maybe get Erin in touch with Jim Hu at Texas A&M
- Wen and Kimberly will contact Jim Hu to get feedback on student curation
- Chris can follow up
Shall we send the AFP form to some users for feedback?
- Kimberly and Daniela: may be confusing for outside testers before adding real data and functionality
- Wen will email contact info of interested user

Updates about SPELL

SGD moved their SPELL to AWS. WormBase needs to set up our own AWS and SGD can give us their code. Will Raymond be able to help set up the AWS? Once installing the AWS SPELL following SGD's code, Wen can load WormBase SPELL data to it.

Personal write-ups/bios

Might be good to have short, 1-page write-ups or bios about each person at WormBase (and/or the Alliance)
Can give to SAB members at SAB meeting (and to others in other contexts)

SAB 2018

Everyone can give a 5 minute talk at the WB-internal project meeting (July 12, day before SAB day)
What do we want to get out of the project meeting?
Are there any communication issues?
Where are things wrt Datomic, and what are the implications for our curation pipeline?
- One impetus for using Datomic was a regularly update-able database from a central repository; what's the status of that?
- Is Datomic migration complete? Not quite; the data on the website is served mostly by Datomic now, but we (Caltech curators) are still modeling in ACEDB only
- There are no current pipelines to curate or load data directly into Datomic
- How much (more) work will it be to completely move everything into Datomic (become completely independent of ACEDB)?
- Datomic is likely ingesting .ACE files, currently
What database solution will the Alliance use? How will that affect our Datomic plan?
PomBase and ZFIN have nightly/weekly updates? How is that enabled?
- ZFIN migrated to Postgres recently
- PomBase compiles code and stores data in memory
How stable can we expect WormBase to be over next year? Likely quite stable, as reflected by recent user comments
What projects can we invest in once/if we're not devoting so many resources to Datomic migration?
- Tools, data analytic tools, automated inferences (e.g. automated gene descriptions)
- Pie charts, graphs, etc. summarizing data in WormBase; what data and what format? Can look at other sites
  - Automatic updates of these data displays
- Interaction Venn diagrams (in progress; Jae and Sibyl working on)
- Trimming SObA graph to make ribbon display equivalent (Raymond and Juancarlos working on)
- Filtering interactions per tissue-type, life stage etc. (at least for regulatory interactions)

WormMine

Many bugs have been fixed in last 6 months
Haven't been pushing on new data types lately
BlueJeans, new Intermine interface, will be released soon, may slow down addition of new data types in near term
Still need human disease, interactions
Would be best to outline a rank-ordered list of priorities, considering data types and species

Gene Ontology

Noctua 1.0 release is available
Noctua simple annotation table available
Release schedule and approach will be changing soon
Trying to switch over to having all annotations come in from Noctua (as opposed to individual sources like Protein2GO, WormBase, etc.)
Looking at how to prioritize GO curation; looking at (currently) information-poor processes, etc.
Want to have more specific relations between genes and processes; move from general "involved in" to more specific, informative relations, e.g. "upstream of" process, implying causality, positive/negative effect
Relation back-curation will be a significant project
These will impact, for example, automated gene descriptions
For information-poor genes: would be interesting to compare phenotype annotations to available process annotations

Data display consistency

Disease and phenotype data have very distinct data display methods
Would be great to have consistent data display
We can rely on discussions within Alliance

Interactions

Check out GeneMANIA; review before SAB (Gary Bader might attend)
Can we implement Cytoscape code at WB at the Alliance site? Not sure, may be straightforward

Alliance All Hands call

Next Wednesday, June 6th
Will cover Noctua/GO-CAM; will be quite technical in focus
May cover Textpresso at future meeting

WormBase-Caltech Weekly Calls May 2018

Contents

May 3, 2018

SimpleMine output

SPELL topics

SPELL problem with WS265

Curator candidate

May 10, 2018

ECO terms for genome editing

ZFIN SAB

Alliance

Genome-wide screens

Nameserver issues

May 17, 2018

Strains

SAB 2018

May 24th, 2018

Helpdesk

Relations Ontology

Methods in Molecular Biology book

ICBO 2018 meeting

May 31, 2018

Feedbacks from Front Range Worm Meeting

Updates about SPELL

Personal write-ups/bios

SAB 2018

WormMine

Gene Ontology

Data display consistency

Interactions

Alliance All Hands call

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools