Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
Line 15: Line 15:
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
 
[[WormBase-Caltech_Weekly_Calls_2016|2016 Meetings]]
  
= 2017 Meetings =
+
[[WormBase-Caltech_Weekly_Calls_2017|2017 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_January_2017|January]]
+
[[WormBase-Caltech_Weekly_Calls_2018|2018 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_February_2017|February]]
+
[[WormBase-Caltech_Weekly_Calls_2019|2019 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_March_2017|March]]
+
[[WormBase-Caltech_Weekly_Calls_2020|2020 Meetings]]
  
[[WormBase-Caltech_Weekly_Calls_April_2017|April]]
+
= 2021 Meetings =
  
[[WormBase-Caltech_Weekly_Calls_May_2017|May]]
+
[[WormBase-Caltech_Weekly_Calls_January_2021|January]]
  
[[WormBase-Caltech_Weekly_Calls_June_2017|June]]
+
[[WormBase-Caltech_Weekly_Calls_February_2021|February]]
  
[[WormBase-Caltech_Weekly_Calls_July_2017|July]]
+
[[WormBase-Caltech_Weekly_Calls_March_2021|March]]
  
[[WormBase-Caltech_Weekly_Calls_August_2017|August]]
 
  
[[WormBase-Caltech_Weekly_Calls_September_2017|September]]
+
== April 1, 2021 ==
  
 +
=== Antibodies ===
 +
* Alignment of the antibody class to Alliance:
 +
** Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
 +
*** Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both  in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
 +
*** Possible pseudonym was used when  a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
 +
** Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.
  
== October 5, 2017 ==
+
All changes proposed above were approved by the group
  
=== NAR Paper ===
+
=== textpress-dev clean up ===
* Raymond will send around reviews
+
* Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-(
 +
* is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines?
 +
* Direct access may be possible via Caltech VPN
 +
* Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines
  
=== Mary Ann's transition ===
+
=== Braun servers ===
* Cecilia will take over laboratory curation
+
* 3 servers stored in Braun server room; is there a new contact person for accessing these servers?
* Hinxton (Paul Davis) will take over nomenclature work
+
* Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know
* Hinxton will take over strain info, Mitani (NBP) alleles, sequence features (large scale)
 
* Sequence feature curation (literature-based, small-scale) will move to Caltech
 
* Molecule curation may be split; Hinxton could coordinate with ChEBI; GO-CAM modeling molecules handled at Caltech?
 
* SOP? Karen wrote Wiki pages for molecule curation; she will look them over and can review with others
 
* Metabolomics was recent big push (biggest effort needed); need to discuss with Michael Witting
 
  
=== AGR Press Release ===
+
=== Citace upload ===
* Discussed briefly at AGR all-hands call yesterday
+
* Next Friday, April 9th, by end of the day
* No one on the call had much info
+
* Wen will contact Paul Davis for the frozen WS280 models file
* Ranjana can post on WB blog on Oct 20th (or earlier?)
 
* Will ask Stacia
 
  
=== AGR momentum ===
 
* AGR disease working group
 
** Lots going on, a lot of progress, still have much to discuss
 
** Next need to work out how to pull in new data objects like alleles, strains, genotypes, etc.
 
** Will need to establish a basic, consensus AGR data model for these data types/objects
 
** How is AGR disease data display compared to WB/MODs? WB is generally (as are other MODs) ahead of AGR
 
* Site-visit
 
** PIs go to D.C. to discuss AGR
 
* Wen working on lab meeting presentations; will present AGR progress, give tutorial
 
* Raymond working on methods paper for SObA display
 
* Raymond will start working on neural functional units with phenotype
 
  
=== Sequence feature curation ===
+
== April 8, 2021 ==
* Really need to encourage authors to submit tables with specific sequence feature info
 
* Need to make the data submission process easy, user-friendly
 
* Karen looking into it
 
* Work study jobs for students to curate? May not be preferred job for students
 
* Temp jobs may be the way to go; but learning curve needs to be very short
 
  
=== WB Person curation/outreach ===
+
=== Braun server outage ===
* Cecilia contacts 1-6 people per week by email
+
* Raymond fixed; now Spica, wobr and wobr2 are back up
* She can ask for contact info: address, email address, phone #, etc.
 
* How many persons do we have email addresses for?
 
* Would be good to establish a 2-way communication with WBPersons
 
* Should some people be demoted if they don't respond to emails? If they are not worm people?
 
* We provide options to hide email addresses from WB person page
 
* What do other MODs do? How extensive is the outreach to the community?
 
* Cecilia reaches out when affiliations change
 
* Some journals require use of ORCID IDs; makes identifying authors less ambiguous
 
* Can ask people to update their intellectual lineage (can show lineage graph)
 
* Juancarlos will query to see how many people with an email address don't have lineage info
 
* One email from WB per month is OK
 
* Can use Twitter more, when we do site visits
 
  
 +
=== Textpresso API ===
 +
* Was down yesterday affecting WormiCloud; Michael has fixed
 +
* Valerio will learn how to manage the API for the future
  
== October 12, 2017 ==
+
=== Grant opportunities ===
 +
* Possibilities to apply for supplements
 +
* May 15th deadline
 +
* Druggable genome project
 +
** Pharos: https://pharos.nih.gov/
 +
** could we contribute?
 +
* Visualization, tools, etc.
 +
* Automated person descriptions?
 +
* Automated descriptions for proteins, ion channels, druggable targets, etc.?
  
=== Strain curation ===
+
=== New WS280 ONTOLOGY FTP directory ===
* Paul Davis will take over strain curation
+
* Changes requested here: https://github.com/WormBase/website/issues/7900
* Many strains important to disease curation may not have been submitted to CGC, and may not exist in WB
+
* Here's the FTP URL: ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/ONTOLOGY/
* Ranjana needs a way to create new strains for disease models
+
* Known issues (Chris will report):
* Ranjana will talk to Paul, and will add documentation about the issue
+
** Ontology files are provided as ".gaf" in addition to ".obo"; we need to remove the ".gaf" OBO files
 +
** Some files are duplicated and/or have inappropriate file extensions
  
=== Phenotype data display proposal ===
+
=== Odd characters in Postgres ===
* Chris has been working on a new way to display phenotype data
+
* Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
* This will largely depend on the new ?Phenotype_experiment model
+
* Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
* Will hopefully solve a number of issues regarding how phenotype data is currently displayed
+
* We would need an automated way to convert special characters, like degree symbols ° into html unicode \&deg\;
* Mockups here: https://docs.google.com/presentation/d/1XvMN16B7RU2yPwjD5p9_TywMj0itln-djEWyGx2IxiQ/edit?usp=sharing
+
* Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters
* Goals are:
 
** Clarify phenotype experiment meta data and provenance
 
** Indicate when multiple perturbations may be responsible for a phenotype
 
** Clarify when a non-coding gene is annotated to a phenotype whether the phenotype may actually result from disruption of a protein coding gene
 
** Clarify when a phenotype results from a mutation that affects the exon(s) of one gene and only an intron of another
 
** Clarify when a phenotype is a double mutant (or other complex perturbation) phenotype
 
  
=== How we handle/display multiple supporting interactions ===
 
* As pointed out by Jae, some interactions appear very strange in the Interactions widget Cytoscape view when there are very many "independent" interactions/observations for a pair of genes
 
* Example, tax-6 and rcan-1: http://www.wormbase.org/species/c_elegans/gene/WBGene00006527#0-8-10
 
* We've tried to represent the "confidence" of interactions with thicker edges on the graph, but maybe this needs revisiting
 
* Should there be a log-based increase in edge thickness, instead of a linear one?
 
* Maybe evidence from the same paper should be considered as one observation?
 
  
=== Collecting person info ===
+
== April 15, 2021 ==
* Could we request personal info when we ask for other info, like author first pass, community curation
 
* Cecilia can manage feedback/input, and triage/defer issues to relevant curators etc.
 
* Ask people to confirm current institution, lineage information
 
* Probably need to keep the request minimal (a single line), so as not to overburden people
 
* We need to point people to person update form
 
* We could also do more through the website
 
  
 +
=== Special characters in Postgres/OA ===
 +
* Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
 +
* There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
 +
* Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
 +
* Would copy/pasting from HTML be better than PDF?
 +
* For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
 +
* Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
 +
** Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
 +
*** $ ./get_summary_characters.pl exp
 +
*** $ ./get_summary_characters.pl int
 +
*** $ ./get_summary_characters.pl grg
 +
** or could pass just the datatype + field (postgres table). e.g.
 +
*** $ ./get_summary_characters.pl pic_description
 +
** Juancarlos will email everyone once it's ready.  It's ready, email sent.  Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl  Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
 +
* Action items:
 +
** Juancarlos will update the "get_summary_characters.pl" script as described above
 +
** Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
 +
** Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
 +
** Curators will need to work with Juancarlos for each OA to modify the dumper
 +
** Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
 +
*** 1) Convert UTF-8 characters to HTML entities in ACE files
 +
*** 2) Convert special quote and hyphen characters into simple versions that don't need special handling
  
== October 19, 2017 ==
+
=== CeNGEN pictures ===
 +
* Model change went in to accommodate images from the CeNGEN project
 +
* Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
 +
* Raymond and Daniela will work with data producers to acquire images when ready
  
=== Site visits, area meetings ===
+
=== Supplement opportunities ===
* About a 12-15 area and topic meetings next year
+
* Money available for software development to "harden" existing software
* Chris will present at Baltimore Worm Meeting on March 16, 2018
+
* Might be possible to make Eduardo's single cell analysis tools more sustainable
* Many meetings are annual (Topic, Seattle, New York), some bimonthly
+
* Could make WormiCloud adapted to Alliance?
* Many of these arrange speakers well ahead of time, 6 months ahead for example
+
* Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)
* We should reach out to organizers soon, especially for annual meetings
 
* We can combine WB, Textpresso, and micropublications
 
* We can/should customize our talks to research areas of each audience
 
* Many of the talks are about 40 minutes to 1 hour
 
  
=== Micropublication forms vs. WB data submission forms ===
+
=== Student project for Textpresso ===
* Managing discrepancies still need to be worked out
+
* Create tool to allow user to submit text and return a list of similar papers
* Looking at differences of output formats
+
* Use cases:
* Customizable forms? Forms built by curator?
+
** curator wants an alert to find papers similar to what they've curated
* Karen: aside from paper details/meta data, forms can be the same
+
** look for potential reviewers of a paper based on similar text content
 
 
=== Phenotype data display proposal ===
 
* Chris has been working on a new way to display phenotype data
 
* This will largely depend on the new ?Phenotype_experiment model
 
* Will hopefully solve a number of issues regarding how phenotype data is currently displayed
 
* Mockups here: https://docs.google.com/presentation/d/1XvMN16B7RU2yPwjD5p9_TywMj0itln-djEWyGx2IxiQ/edit?usp=sharing
 
* Goals are:
 
** Clarify phenotype experiment meta data and provenance
 
** Indicate when multiple perturbations may be responsible for a phenotype
 
** Clarify when a non-coding gene is annotated to a phenotype whether the phenotype may actually result from disruption of a protein coding gene
 
** Clarify when a phenotype results from a mutation that affects the exon(s) of one gene and only an intron of another
 
** Clarify when a phenotype is a double mutant (or other complex perturbation) phenotype
 
 
 
 
 
== October 26, 2017 ==
 
 
 
=== GO CAM ===
 
* GO CAM models do not go into ACEDB, yet
 
* Kimberly is working on it
 
* First, Berkeley group is working on producing correct GPAD files as output
 
* Work is ongoing for a Cytoscape view of pathways
 
* Noctua calls on 2nd and 4th Wednesdays of the month; anyone can sit in (ask Kimberly)
 
* Noctua call minutes on Wiki (Kimberly will send around link)
 
 
 
=== Site visits ===
 
* Wen setup Wiki page: http://wiki.wormbase.org/index.php/Meetings#Upcoming_Meetings
 
* What's the strategy overall? Are we trying to make all meetings?
 
* About 15 meetings to go to; would be good if we could make all of these, if possible
 
* Kimberly will go to New York meeting
 
* Chris will go to Baltimore meeting
 
* Boulder, CO meeting; interested in having WB talk; sometime in May
 
* San Diego meeting in January (Wen out of town); volunteers?
 
* Chris could do Boston and Worcester meetings
 
 
 
=== Strains ===
 
* Are strains in the nameserver? No, and Hinxton is likely retiring the nameserver soon
 
* Can curators get access to a tool to request a strain object?
 
* Juancarlos and Ranjana are working on tool (cronjob) to that will automatically transfer the strain in the 'Requested Strain' field in the disease OA to the 'Strain' field once Paul Davis approves the strain. 
 
*The request to Paul Davis to create the strain is via the new_objects.cgi form.
 
 
 
=== Changes to phenotype form ===
 
* We need a mechanism to clarify if multiple perturbations entered in the phenotype form represent multiple single experiments or a single complex genotype experiment
 
* Will give users the option to indicate which of these two scenarios is the case
 
* We discussed ways of managing this in a previous WB CIT meeting
 
* Produced mockups of proposal: https://docs.google.com/presentation/d/1HFVn1anbdCu8pHJW0g27a5AvYLFVMyBy4nnrEdA1wKQ/edit?usp=sharing
 

Latest revision as of 19:34, 15 April 2021

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings

2020 Meetings

2021 Meetings

January

February

March


April 1, 2021

Antibodies

  • Alignment of the antibody class to Alliance:
    • Propose to move possible_pseudonym (192) and Other_animal (37) to remarks. Those tags are not currently used for curation.
      • Other animal is sometimes used for older annotations, e.g. authors say that the antibodies were raised both in rats and rabbits. Standard practice would create 2 records, one for the rat antibody and one for the rabbit.
      • Possible pseudonym was used when a curator was not able to unambiguously assign a previous antibody to a record. (we have a Other name -synonym- tag to capture unambiguous ones). When moving to remarks we can keep a controlled vocabulary for easy future parsing, e.g. “possible_pseudonym:”
    • Antigen field: currently separated into Protein, peptide, and other_antigen (e.g.: homogenate of early C.elegans embryos, sperm). Propose to use just one antigen field to capture antigen info.

All changes proposed above were approved by the group

textpress-dev clean up

  • Michael has asked curators to assess what they have on textpresso-dev as it will not be around forever :-(
  • is it okay to transfer data and files we want to keep to tazendra? and then to our own individual machines?
  • Direct access may be possible via Caltech VPN
  • Do we want to move content to AWS? May be complicated; it is still easy and cheap to maintain local file systems/machines

Braun servers

  • 3 servers stored in Braun server room; is there a new contact person for accessing these servers?
  • Mike Miranda replacement just getting settled; Paul will find out who is managing the server room and let Raymond know

Citace upload

  • Next Friday, April 9th, by end of the day
  • Wen will contact Paul Davis for the frozen WS280 models file


April 8, 2021

Braun server outage

  • Raymond fixed; now Spica, wobr and wobr2 are back up

Textpresso API

  • Was down yesterday affecting WormiCloud; Michael has fixed
  • Valerio will learn how to manage the API for the future

Grant opportunities

  • Possibilities to apply for supplements
  • May 15th deadline
  • Druggable genome project
  • Visualization, tools, etc.
  • Automated person descriptions?
  • Automated descriptions for proteins, ion channels, druggable targets, etc.?

New WS280 ONTOLOGY FTP directory

Odd characters in Postgres

  • Daniela and Juancarlos discovered some errors with respect to special characters pasted into the OA
  • Daniela would like to automatically pull in micropublication text (e.g. figure captions) into Postgres
  • We would need an automated way to convert special characters, like degree symbols ° into html unicode \&deg\;
  • Juancarlos and Valerio will look into possibly switching from a Perl module to a Python module to handle special characters


April 15, 2021

Special characters in Postgres/OA

  • Juancarlos working on/proposing a plan to store UTF-8 characters in Postgres and the OA which would then get converted, at dumping, to HTML entities (e.g. α) for the ACE files
  • There is still a bit of cleanup needed to fix or remove special characters (not necessarily UTF-8) that apparently got munged upon copy/pasting into the OA in the past
  • Note: copy/paste from a PDF often works fine, but sometimes does not work as expected so manual intervention would be needed (e.g. entering Greek characters by hand in UTF-8 format)
  • Would copy/pasting from HTML be better than PDF?
  • For Person curation it would be good to be able to faithfully store and display appropriate foreign characters (e.g. Chinese characters, Danish characters, etc.)
  • Mangolassi script called "get_summary_characters.pl" located here: /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters
    • Juancarlos will modify script to take a data type code as an argument on the command line and return all Postgres tables (and their respective PGIDs) that have special characters, e.g.
      • $ ./get_summary_characters.pl exp
      • $ ./get_summary_characters.pl int
      • $ ./get_summary_characters.pl grg
    • or could pass just the datatype + field (postgres table). e.g.
      • $ ./get_summary_characters.pl pic_description
    • Juancarlos will email everyone once it's ready. It's ready, email sent. Script is at /home/postgres/work/pgpopulation/oa_general/20210411_unicode_html/get_summary_characters.pl Symlink this to your directory and run it from there, it will create files in the directory you are at when running it.
  • Action items:
    • Juancarlos will update the "get_summary_characters.pl" script as described above
    • Curators should use the "get_summary_characters.pl" to look for (potentially) bad characters in their OAs/Postgres tables
    • Need to perform bulk (automated) replacement of existing HTML entities into corresponding UTF-8 characters
    • Curators will need to work with Juancarlos for each OA to modify the dumper
    • Juancarlos will write (or append to existing) Postgres/OA dumping scripts to:
      • 1) Convert UTF-8 characters to HTML entities in ACE files
      • 2) Convert special quote and hyphen characters into simple versions that don't need special handling

CeNGEN pictures

  • Model change went in to accommodate images from the CeNGEN project
  • Want gene page images for CeNGEN data; have the specifications for such images been worked out? Maybe not yet
  • Raymond and Daniela will work with data producers to acquire images when ready

Supplement opportunities

  • Money available for software development to "harden" existing software
  • Might be possible to make Eduardo's single cell analysis tools more sustainable
  • Could make WormiCloud adapted to Alliance?
  • Put Noctua on more stable production footing? (GO cannot apply as they are in final year of existing grant)

Student project for Textpresso

  • Create tool to allow user to submit text and return a list of similar papers
  • Use cases:
    • curator wants an alert to find papers similar to what they've curated
    • look for potential reviewers of a paper based on similar text content