|
|
Line 26: |
Line 26: |
| [[WormBase-Caltech_Weekly_Calls_July_2014|July]] | | [[WormBase-Caltech_Weekly_Calls_July_2014|July]] |
| | | |
| + | [[WormBase-Caltech_Weekly_Calls_August_2014|August]] |
| | | |
− | == August 7, 2014 ==
| |
| | | |
− | * Topic Curation
| + | == September 4, 2014 == |
− | ** Working out pipeline for curation
| |
− | ** Collecting model/pathway diagrams from papers and reviews: should we make ?Picture objects for these?
| |
− | *** Do we want to display these published diagrams on ?Paper pages and/or ?Topic pages
| |
− | *** We need to determine copyrights/accessibility to images
| |
− | ** Would be good to automate identification of Review articles (or "non-primary" in general) in the Topic OA
| |
− | *** This could be done at the bulk import step
| |
− | *** Would require a new (possibly read-only) field in the Topic OA to indicate "primary" or "non-primary"
| |
− | ** Chris will look into whether "non-primary" articles are automatically excluded from the Curation Status Form
| |
− | ** The "Curation Status Omit" toggle in the Topic OA may become obsolete once we have a field/column for primary/non-primary status
| |
− | ** Current topic : Wnt signaling
| |
− | ** We have a list of 282 papers from PubMed search using "Wnt" search term (not Mesh term) and C. elegans (Mesh term)
| |
− | ** Finding all papers for a category/topic remains an ad hoc approach; different topics are harder or easier to find all papers for
| |
− | ** We can create a Textpresso category for topics, like "Wnt signaling"
| |
| | | |
− | | + | * Topic 1 |
− | * WormBase Ontology Browser
| + | * Topic 2 |
− | ** Needs documenting
| |
− | ** Once everything is clarified, will be pushed to the live site
| |
− | | |
− | == August 14, 2014 ==
| |
− | | |
− | * WormBase Ontology Browser (WOBr)
| |
− | ** Should be ready but may need some additional testing before pushing to staging site
| |
− | ** Juancarlos will push to staging during meeting
| |
− | ** Curators should test this afternoon (on staging) and report any issues before 5pm
| |
− | | |
− | * DataBase call this morning (and every Thursday that doesn't have a site-wide call)
| |
− | ** Thomas Down testing Datomic (costs money) and Neo4J (very slow)
| |
− | ** May look again at DynamoDB
| |
− | ** No DB has been officially chosen
| |
− | ** Will have an Amazon AWS for collaboration.
| |
− | ** Expand smallace into bigger mediumace.
| |
− | ** Demo off one database for advisory board meeting.
| |
− | | |
− | * SAB
| |
− | ** Paul Sternberg back next week; will settle travel plans then
| |
− | ** What advisors are attending?
| |
− | ** What feedback do we want to get from advisors?
| |
− | ** How to show curation/DB progress? What stats/numbers to show?
| |
− | ** We want big picture feedback from biologist advisors; what's useful to the community? What should we prioritize?
| |
− | ** Karen: Perhaps a question to ask is "what are the main questions they are trying to answer when they go to the website?". When they explore a gene or protein function, what is it that they would want to see, and how? I don't think we are missing information so much as a lack of integration of the information at the model level, for example, variation phenotype affects linked to altered protein domain function
| |
− | | |
− | * Belated apologies from Mary Ann - clash of evening events.
| |
− | | |
− | == August 21, 2014 ==
| |
− | | |
− | * Generating a Site-Map for WormBase
| |
− | ** Use a crawler to generate? Output would need to be made human-readable
| |
− | ** We could use the legacy site as a site map
| |
− | * Citace upload report modifications
| |
− | ** wikipage here: [[Citace_upload_report]]
| |
− | ** Goals of this report
| |
− | *** Summary of the uploaded data classes/objects - this summary should be blind to requests from curators and should alert Caltech to missing data classes or severe changes in numbers of objects within preexisting data classes
| |
− | *** Summary of curation work - this summary should be curator driven, in some cases the summary will require a more involved aceperl query to get at the actual annotation rather than a straight data class object number.
| |
− | ** Can we get a comparison for those data that are curated through postgres? It would be very helpful to be able to compare the changes in postgres with the changes in the Citace upload.
| |
− | ** Can we automate the generation of this report to make it easier to change and track?
| |
− | ** Regardless of manual or automated report generation when there is a model change or data class addition/subtraction, the responsible curator needs to inform Wen of the need for compensatory modification in the report.
| |
− | | |
− | == August 28, 2014 ==
| |
− | | |
− | * Curation Statistics for SAB
| |
− | ** Curators should send Chris all stats for their respective data types: total papers curated, total backlog, false positives
| |
− | ** Can we ignore SVM results for certain data types?
| |
− | ** Can we include Textpresso search results for relevant data types?
| |
− | ** Curators that use a Textpresso pipeline: Karen, Ranjana, Xiaodong, (Daniela?), Mary Ann
| |
− | ** Can we get detailed web usage statistics on particular datatypes?
| |
− | ** We want to articulate our priorities to the SAB; get feedback
| |
− | ** RNAi curation could get up to speed in 5 years if we have two FTEs on RNAi curation
| |
− | ** Are there certain genes that have less phenotype coverage that we should prioritize?
| |
− | | |
− | * Database migration call
| |
− | ** MongoDB, CouchDB, Neo4J, Datomic, OrientDB, postgreSQL, Cassandra candidates
| |
− | ** Neo4J likely ruled out because of slow performance
| |
− | ** Will compare performance of Datomic vs. Postgres and ACEDB, etc.
| |
− | ** Datomic has good history tracking
| |
− | ** Thomas Down has experience with Datomic
| |
− | ** Probably won't go with a relational database
| |
− | ** We should use the Gene page (webpage) as a demo/example of what we want to try to emulate
| |
− | ** Adam (from Lincoln's group) working with OrientDB (graph database)
| |
− | | |
− | * Citace Upload Report
| |
− | ** Classes/datatypes missing from Citace Upload Report
| |
− | ** Karen started a Wiki page to capture this info: [[Citace_upload_report]]
| |
− | ** Curators should take a look and make sure it is properly filled out for their data types | |
− | ** Columns are present in the table to make requests for certain numbers in Citace Upload Report and/or the Build Report | |
− | ** Wen will take requests for queries to Citace etc. to add data to report
| |
− | | |
− | * UniProt linking to and from WormBase
| |
− | **UniProt would like us to add information about some specific curated datatypes to a file we already supply that maps paper and gene identifiers
| |
− | **Kimberly and Juancarlos will work on this
| |
− | **Need to clarify with UniProt how links back to WormBase can/will be implemented
| |
− | | |
− | * Paul S going to NIH for data science meeting next week
| |
− | | |
− | * FlyBase pushing human disease curation
| |
− | | |
− | * LEGO backend updates
| |
− | ** Still need to get some backend logistics sorted out
| |
− | ** Communication between Michael Muller and Chris Mungall
| |
− | ** OA-like interface for Noctua?
| |