Difference between revisions of "WormBase-Caltech Weekly Calls August 2014"

From WormBaseWiki
Jump to navigationJump to search
(Created page with "== August 7, 2014 == * Topic Curation ** Working out pipeline for curation ** Collecting model/pathway diagrams from papers and reviews: should we make ?Picture objects for t...")
 
m
 
Line 1: Line 1:
 
== August 7, 2014 ==
 
== August 7, 2014 ==
  
* Topic Curation
+
=== Topic Curation ===
** Working out pipeline for curation
+
* Working out pipeline for curation
** Collecting model/pathway diagrams from papers and reviews: should we make ?Picture objects for these?
+
* Collecting model/pathway diagrams from papers and reviews: should we make ?Picture objects for these?
*** Do we want to display these published diagrams on ?Paper pages and/or ?Topic pages
+
** Do we want to display these published diagrams on ?Paper pages and/or ?Topic pages
*** We need to determine copyrights/accessibility to images
+
** We need to determine copyrights/accessibility to images
** Would be good to automate identification of Review articles (or "non-primary" in general) in the Topic OA
+
* Would be good to automate identification of Review articles (or "non-primary" in general) in the Topic OA
*** This could be done at the bulk import step
+
** This could be done at the bulk import step
*** Would require a new (possibly read-only) field in the Topic OA to indicate "primary" or "non-primary"
+
** Would require a new (possibly read-only) field in the Topic OA to indicate "primary" or "non-primary"
** Chris will look into whether "non-primary" articles are automatically excluded from the Curation Status Form
+
* Chris will look into whether "non-primary" articles are automatically excluded from the Curation Status Form
** The "Curation Status Omit" toggle in the Topic OA may become obsolete once we have a field/column for primary/non-primary status
+
* The "Curation Status Omit" toggle in the Topic OA may become obsolete once we have a field/column for primary/non-primary status
** Current topic : Wnt signaling
+
* Current topic : Wnt signaling
** We have a list of 282 papers from PubMed search using "Wnt" search term (not Mesh term) and C. elegans (Mesh term)
+
* We have a list of 282 papers from PubMed search using "Wnt" search term (not Mesh term) and C. elegans (Mesh term)
** Finding all papers for a category/topic remains an ad hoc approach; different topics are harder or easier to find all papers for
+
* Finding all papers for a category/topic remains an ad hoc approach; different topics are harder or easier to find all papers for
** We can create a Textpresso category for topics, like "Wnt signaling"
+
* We can create a Textpresso category for topics, like "Wnt signaling"
  
  
* WormBase Ontology Browser
+
=== WormBase Ontology Browser ===
** Needs documenting
+
* Needs documenting
** Once everything is clarified, will be pushed to the live site
+
* Once everything is clarified, will be pushed to the live site
  
 
== August 14, 2014 ==
 
== August 14, 2014 ==
  
* WormBase Ontology Browser (WOBr)
+
=== WormBase Ontology Browser (WOBr) ===
** Should be ready but may need some additional testing before pushing to staging site
+
* Should be ready but may need some additional testing before pushing to staging site
** Juancarlos will push to staging during meeting
+
* Juancarlos will push to staging during meeting
** Curators should test this afternoon (on staging) and report any issues before 5pm
+
* Curators should test this afternoon (on staging) and report any issues before 5pm
  
* DataBase call this morning (and every Thursday that doesn't have a site-wide call)
+
=== DataBase call ===
** Thomas Down testing Datomic (costs money) and Neo4J (very slow)
+
* This morning (and every Thursday that doesn't have a site-wide call)
** May look again at DynamoDB
+
* Thomas Down testing Datomic (costs money) and Neo4J (very slow)
** No DB has been officially chosen
+
* May look again at DynamoDB
** Will have an Amazon AWS for collaboration.
+
* No DB has been officially chosen
** Expand smallace into bigger mediumace.
+
* Will have an Amazon AWS for collaboration.
** Demo off one database for advisory board meeting.
+
* Expand smallace into bigger mediumace.
 +
* Demo off one database for advisory board meeting.
  
* SAB
+
=== SAB ===
** Paul Sternberg back next week; will settle travel plans then
+
* Paul Sternberg back next week; will settle travel plans then
** What advisors are attending?
+
* What advisors are attending?
** What feedback do we want to get from advisors?
+
* What feedback do we want to get from advisors?
** How to show curation/DB progress? What stats/numbers to show?
+
* How to show curation/DB progress? What stats/numbers to show?
** We want big picture feedback from biologist advisors; what's useful to the community? What should we prioritize?
+
* We want big picture feedback from biologist advisors; what's useful to the community? What should we prioritize?
** Karen: Perhaps a question to ask is "what are the main questions they are trying to answer when they go to the website?".  When they explore a gene or protein function, what is it that they would want to see, and how?  I don't think we are missing information so much as a lack of integration of the information at the model level, for example, variation phenotype affects linked to altered protein domain function
+
* Karen: Perhaps a question to ask is "what are the main questions they are trying to answer when they go to the website?".  When they explore a gene or protein function, what is it that they would want to see, and how?  I don't think we are missing information so much as a lack of integration of the information at the model level, for example, variation phenotype affects linked to altered protein domain function
  
 
* Belated apologies from Mary Ann - clash of evening events.
 
* Belated apologies from Mary Ann - clash of evening events.
Line 48: Line 49:
 
== August 21, 2014 ==
 
== August 21, 2014 ==
  
* Generating a Site-Map for WormBase
+
=== Generating a Site-Map for WormBase ===
** Use a crawler to generate? Output would need to be made human-readable
+
* Use a crawler to generate? Output would need to be made human-readable
** We could use the legacy site as a site map
+
* We could use the legacy site as a site map
* Citace upload report modifications
+
 
** wikipage here: [[Citace_upload_report]]
+
=== Citace upload report modifications ===
** Goals of this report
+
* wikipage here: [[Citace_upload_report]]
*** Summary of the uploaded data classes/objects - this summary should be blind to requests from curators and should alert Caltech to missing data classes or severe changes in numbers of objects within preexisting data classes
+
* Goals of this report
*** Summary of curation work - this summary should be curator driven, in some cases the summary will require a more involved aceperl query to get at the actual annotation rather than a straight data class object number.
+
** Summary of the uploaded data classes/objects - this summary should be blind to requests from curators and should alert Caltech to missing data classes or severe changes in numbers of objects within preexisting data classes
** Can we get a comparison for those data that are curated through postgres? It would be very helpful to be able to compare the changes in postgres with the changes in the Citace upload.
+
** Summary of curation work - this summary should be curator driven, in some cases the summary will require a more involved aceperl query to get at the actual annotation rather than a straight data class object number.
** Can we automate the generation of this report to make it easier to change and track?  
+
* Can we get a comparison for those data that are curated through postgres? It would be very helpful to be able to compare the changes in postgres with the changes in the Citace upload.
** Regardless of manual or automated report generation when there is a model change or data class addition/subtraction, the responsible curator needs to inform Wen of the need for compensatory modification in the report.
+
* Can we automate the generation of this report to make it easier to change and track?  
 +
* Regardless of manual or automated report generation when there is a model change or data class addition/subtraction, the responsible curator needs to inform Wen of the need for compensatory modification in the report.
  
 
== August 28, 2014 ==
 
== August 28, 2014 ==
  
* Curation Statistics for SAB
+
=== Curation Statistics for SAB ===
** Curators should send Chris all stats for their respective data types: total papers curated, total backlog, false positives
+
* Curators should send Chris all stats for their respective data types: total papers curated, total backlog, false positives
** Can we ignore SVM results for certain data types?
+
* Can we ignore SVM results for certain data types?
** Can we include Textpresso search results for relevant data types?
+
* Can we include Textpresso search results for relevant data types?
** Curators that use a Textpresso pipeline: Karen, Ranjana, Xiaodong, (Daniela?), Mary Ann
+
* Curators that use a Textpresso pipeline: Karen, Ranjana, Xiaodong, (Daniela?), Mary Ann
** Can we get detailed web usage statistics on particular datatypes?
+
* Can we get detailed web usage statistics on particular datatypes?
** We want to articulate our priorities to the SAB; get feedback
+
* We want to articulate our priorities to the SAB; get feedback
** RNAi curation could get up to speed in 5 years if we have two FTEs on RNAi curation
+
* RNAi curation could get up to speed in 5 years if we have two FTEs on RNAi curation
** Are there certain genes that have less phenotype coverage that we should prioritize?
+
* Are there certain genes that have less phenotype coverage that we should prioritize?
  
* Database migration call
+
=== Database migration call ===
** MongoDB, CouchDB, Neo4J, Datomic, OrientDB, postgreSQL, Cassandra candidates
+
* MongoDB, CouchDB, Neo4J, Datomic, OrientDB, postgreSQL, Cassandra candidates
** Neo4J likely ruled out because of slow performance
+
* Neo4J likely ruled out because of slow performance
** Will compare performance of Datomic vs. Postgres and ACEDB, etc.
+
* Will compare performance of Datomic vs. Postgres and ACEDB, etc.
** Datomic has good history tracking
+
* Datomic has good history tracking
** Thomas Down has experience with Datomic
+
* Thomas Down has experience with Datomic
** Probably won't go with a relational database
+
* Probably won't go with a relational database
** We should use the Gene page (webpage) as a demo/example of what we want to try to emulate
+
* We should use the Gene page (webpage) as a demo/example of what we want to try to emulate
** Adam (from Lincoln's group) working with OrientDB (graph database)
+
* Adam (from Lincoln's group) working with OrientDB (graph database)
  
* Citace Upload Report
+
=== Citace Upload Report ===
** Classes/datatypes missing from Citace Upload Report
+
* Classes/datatypes missing from Citace Upload Report
** Karen started a Wiki page to capture this info: [[Citace_upload_report]]
+
* Karen started a Wiki page to capture this info: [[Citace_upload_report]]
** Curators should take a look and make sure it is properly filled out for their data types
+
* Curators should take a look and make sure it is properly filled out for their data types
** Columns are present in the table to make requests for certain numbers in Citace Upload Report and/or the Build Report
+
* Columns are present in the table to make requests for certain numbers in Citace Upload Report and/or the Build Report
** Wen will take requests for queries to Citace etc. to add data to report
+
* Wen will take requests for queries to Citace etc. to add data to report
  
* UniProt linking to and from WormBase
+
=== UniProt linking to and from WormBase ===
**UniProt would like us to add information about some specific curated datatypes to a file we already supply that maps paper and gene identifiers
+
*UniProt would like us to add information about some specific curated datatypes to a file we already supply that maps paper and gene identifiers
**Kimberly and Juancarlos will work on this
+
*Kimberly and Juancarlos will work on this
**Need to clarify with UniProt how links back to WormBase can/will be implemented
+
*Need to clarify with UniProt how links back to WormBase can/will be implemented
  
* Paul S going to NIH for data science meeting next week
+
=== Paul S going to NIH for data science meeting next week ===
  
* FlyBase pushing human disease curation
+
=== FlyBase pushing human disease curation ===
  
* LEGO backend updates
+
=== LEGO backend updates ===
** Still need to get some backend logistics sorted out
+
* Still need to get some backend logistics sorted out
** Communication between Michael Muller and Chris Mungall
+
* Communication between Michael Muller and Chris Mungall
** OA-like interface for Noctua?
+
* OA-like interface for Noctua?

Latest revision as of 17:28, 3 September 2014

August 7, 2014

Topic Curation

  • Working out pipeline for curation
  • Collecting model/pathway diagrams from papers and reviews: should we make ?Picture objects for these?
    • Do we want to display these published diagrams on ?Paper pages and/or ?Topic pages
    • We need to determine copyrights/accessibility to images
  • Would be good to automate identification of Review articles (or "non-primary" in general) in the Topic OA
    • This could be done at the bulk import step
    • Would require a new (possibly read-only) field in the Topic OA to indicate "primary" or "non-primary"
  • Chris will look into whether "non-primary" articles are automatically excluded from the Curation Status Form
  • The "Curation Status Omit" toggle in the Topic OA may become obsolete once we have a field/column for primary/non-primary status
  • Current topic : Wnt signaling
  • We have a list of 282 papers from PubMed search using "Wnt" search term (not Mesh term) and C. elegans (Mesh term)
  • Finding all papers for a category/topic remains an ad hoc approach; different topics are harder or easier to find all papers for
  • We can create a Textpresso category for topics, like "Wnt signaling"


WormBase Ontology Browser

  • Needs documenting
  • Once everything is clarified, will be pushed to the live site

August 14, 2014

WormBase Ontology Browser (WOBr)

  • Should be ready but may need some additional testing before pushing to staging site
  • Juancarlos will push to staging during meeting
  • Curators should test this afternoon (on staging) and report any issues before 5pm

DataBase call

  • This morning (and every Thursday that doesn't have a site-wide call)
  • Thomas Down testing Datomic (costs money) and Neo4J (very slow)
  • May look again at DynamoDB
  • No DB has been officially chosen
  • Will have an Amazon AWS for collaboration.
  • Expand smallace into bigger mediumace.
  • Demo off one database for advisory board meeting.

SAB

  • Paul Sternberg back next week; will settle travel plans then
  • What advisors are attending?
  • What feedback do we want to get from advisors?
  • How to show curation/DB progress? What stats/numbers to show?
  • We want big picture feedback from biologist advisors; what's useful to the community? What should we prioritize?
  • Karen: Perhaps a question to ask is "what are the main questions they are trying to answer when they go to the website?". When they explore a gene or protein function, what is it that they would want to see, and how? I don't think we are missing information so much as a lack of integration of the information at the model level, for example, variation phenotype affects linked to altered protein domain function
  • Belated apologies from Mary Ann - clash of evening events.

August 21, 2014

Generating a Site-Map for WormBase

  • Use a crawler to generate? Output would need to be made human-readable
  • We could use the legacy site as a site map

Citace upload report modifications

  • wikipage here: Citace_upload_report
  • Goals of this report
    • Summary of the uploaded data classes/objects - this summary should be blind to requests from curators and should alert Caltech to missing data classes or severe changes in numbers of objects within preexisting data classes
    • Summary of curation work - this summary should be curator driven, in some cases the summary will require a more involved aceperl query to get at the actual annotation rather than a straight data class object number.
  • Can we get a comparison for those data that are curated through postgres? It would be very helpful to be able to compare the changes in postgres with the changes in the Citace upload.
  • Can we automate the generation of this report to make it easier to change and track?
  • Regardless of manual or automated report generation when there is a model change or data class addition/subtraction, the responsible curator needs to inform Wen of the need for compensatory modification in the report.

August 28, 2014

Curation Statistics for SAB

  • Curators should send Chris all stats for their respective data types: total papers curated, total backlog, false positives
  • Can we ignore SVM results for certain data types?
  • Can we include Textpresso search results for relevant data types?
  • Curators that use a Textpresso pipeline: Karen, Ranjana, Xiaodong, (Daniela?), Mary Ann
  • Can we get detailed web usage statistics on particular datatypes?
  • We want to articulate our priorities to the SAB; get feedback
  • RNAi curation could get up to speed in 5 years if we have two FTEs on RNAi curation
  • Are there certain genes that have less phenotype coverage that we should prioritize?

Database migration call

  • MongoDB, CouchDB, Neo4J, Datomic, OrientDB, postgreSQL, Cassandra candidates
  • Neo4J likely ruled out because of slow performance
  • Will compare performance of Datomic vs. Postgres and ACEDB, etc.
  • Datomic has good history tracking
  • Thomas Down has experience with Datomic
  • Probably won't go with a relational database
  • We should use the Gene page (webpage) as a demo/example of what we want to try to emulate
  • Adam (from Lincoln's group) working with OrientDB (graph database)

Citace Upload Report

  • Classes/datatypes missing from Citace Upload Report
  • Karen started a Wiki page to capture this info: Citace_upload_report
  • Curators should take a look and make sure it is properly filled out for their data types
  • Columns are present in the table to make requests for certain numbers in Citace Upload Report and/or the Build Report
  • Wen will take requests for queries to Citace etc. to add data to report

UniProt linking to and from WormBase

  • UniProt would like us to add information about some specific curated datatypes to a file we already supply that maps paper and gene identifiers
  • Kimberly and Juancarlos will work on this
  • Need to clarify with UniProt how links back to WormBase can/will be implemented

Paul S going to NIH for data science meeting next week

FlyBase pushing human disease curation

LEGO backend updates

  • Still need to get some backend logistics sorted out
  • Communication between Michael Muller and Chris Mungall
  • OA-like interface for Noctua?