Difference between revisions of "WormBase-Caltech Weekly Calls"

From WormBaseWiki
Jump to navigationJump to search
 
(80 intermediate revisions by 6 users not shown)
Line 22: Line 22:
  
  
GoToMeeting link: https://www.gotomeet.me/wormbase1
+
 
  
 
= 2020 Meetings =
 
= 2020 Meetings =
Line 32: Line 32:
 
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
 
[[WormBase-Caltech_Weekly_Calls_March_2020|March]]
  
 +
[[WormBase-Caltech_Weekly_Calls_April_2020|April]]
  
== April 2, 2020 ==
+
[[WormBase-Caltech_Weekly_Calls_May_2020|May]]
 
 
=== Community phenotype requests ===
 
* March 9-28
 
* 2,548 emails went out; 89 bounced; 6 resent; 13 backup; 2,478 successful emails
 
* 361 annotations overall
 
* 48 papers requested received curation (2% response rate)
 
* 53 distinct papers overall (5 papers without request)
 
* 53 distinct persons overall
 
 
 
=== Community curation volunteers ===
 
* Tracking volunteers [https://docs.google.com/spreadsheets/d/1ldECC44PXMilcDO6ctz-8AkRZntfDoV0Wtc4F-T_Zvg/edit?usp=sharing here]
 
* 14 volunteers so far, all have been assigned a WBPerson ID
 
* Chris will set up a webinar tutorial in the coming week or two
 
 
 
=== AFP pipeline ===
 
* Will resend email requests to authors that haven't already responded
 
* May also send out for older papers
 
* May work with people to help
 
* Does the old AFP form still work? It should
 
* If someone has a link to the old form, they won't get one for the new form
 
* Maybe could set up an automatic redirect from the old form to the new form
 
* Received many submissions recently (>20% response rate)
 
 
 
=== Ontology Annotator ===
 
* Need to work on Genotype OA dumper
 
* Turns out semicolons are problematic (currently in genotypes and transgenes) for object names (ontology fields)
 
* Ampersands (&) are also problematic for object names in the OA
 
** 20237  | Is[Pgcy-5::daf-2a::venus; Punc-122::mCherry]                          | 2014-10-08 10:32:45.874519-07
 
** 20239  | Ex[Pgcy-5::casy-1::venus; Pgcy-5::aman-2::mCherry; Punc-122::mCherry] | 2014-10-08 10:45:23.202362-07
 
** 20238  | Is[Pgcy-5::daf-2c::venus; Punc-122::mCherry]                         | 2014-10-08 10:38:19.859078-07
 
** 25249  | Ex[Prheb-1::rheb-1::GFP; unc-119(+]                                   | 2018-06-29 10:16:40.784295-07
 
** 16283  | [hlh-13::GFP;unc-119(+)]                                              | 2013-02-07 17:43:22.384819-08
 
** 26131  | Ex[pedc-3EDC-3::DsRed;pRF4]                                          | 2019-08-14 08:44:49.91063-07
 
 
 
=== Use Slack More ===
 
* Slack is a good tool for quick communication among team members; would be good for all curators to join Slack to enable efficient communication
 
 
 
 
 
== April 9, 2020 ==
 
  
=== Volunteer curators ===
 
* Have sent out emails to schedule tutorials
 
* Chris had one tutorial with Michael Davies (Alyson Ashe's lab) yesterday
 
* One already scheduled for next Monday with Wilber and Stephanie from Paul's lab
 
* Two others already scheduled for next Tuesday with Lina Dahlberg and Colin Dolphin
 
  
===TAGC is virtual (4.22-25.2020)===
+
== June 4, 2020 ==
FYI in case you missed it
 
*You still have to register (it's free), if you hadn't before
 
https://genetics-gsa.org/tagc-2020/registration/
 
  
===summer students===
+
=== Citace (tentative) upload ===
* Caltech SURF students (and other summer students worldwide) now are looking for projects
+
* CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
* Maybe they could curate for WormBase
+
* Citace upload to Hinxton on Friday, July 10th
* In addition to phenotype, they could curate:
 
** Allele/lesion sequence curation (using Allele Sequence form); maybe Paul Davis could make a tutorial video?
 
** Anatomy function, looking for novel info; opportunity to program/code
 
  
=== OA semicolon issue ===
+
=== Caltech reopening ===
* Juancarlos has fixed the issues on sandbox
+
* Paul looking to get plan approved
* Curators should test on Mangolassi
+
* People that want to come to campus need to watch training video
 +
* Masks available in Paul's lab
 +
* Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
 +
** Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
 +
* Need to coordinate, maybe make a Google calendar to do so (also Slack)
 +
* Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
 +
* Also, need to submit who you were in contact with for contact tracing
 +
* Form is used all week, and hold on to it until asked to be submitted
 +
* If someone goes in to the office, they could print several forms for people to pick up in WB offices
  
=== Textmining/automation ===
+
=== Nameserver ===
* Daniela will discuss with Christina Zorn from Xenbase
+
* Nameserver was down
* Will discuss SVM, AFP, Textpresso, etc.
+
* CIT curators would still like to have a single form to interact with
 +
* Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
 +
* Still putting genotype and all info for a strain in the reason/why field in the nameserver
 +
* We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
 +
* Hinxton is pulling in CGC strains, how often?
 +
* Caltech could possibly get a block of IDs
  
=== Retracted WBPapers ===
+
=== Alliance SimpleMine ===
* Jae & Kimberly put in GitHub ticket to make retractions clear on WormBase site
+
* Any updates? 3.1 feature freeze is tomorrow
* https://github.com/WormBase/website/issues/7637
+
* Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call
* Can we systematically detect retractions? Yes
 
* What about finding papers that cite retractions? Maybe, but likely tricky
 
  
  
== April 16, 2020 ==
+
== June 11, 2020 ==
  
=== Community Phenotype Curation Tutorials ===
+
=== Name Service ===
* Chris has run 6 tutorials, recorded 4
+
* Testing site now up; linked to Mangolassi
* MPG files saved on DropBox; ask Chris for access
+
* CGI from Juancarlos not accepting all characters, including double quotes like "
* Plan to edit videos to make tutorial video to post on WB YouTube channel
+
* Example submission that fails via CGI
 +
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | &alpha; &beta; Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “  ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
 +
* Juancarlos will look into and try to fix
  
=== Author First Pass ===
+
=== Alliance Literature group ===
* May run a webinar and use Zoom to record
+
* Textpresso vs. OntoMate vs. PubMed
* May make a short tutorial video
+
* Still some confusion about what the different tasks can be performed in each tool
* Jae: Is there documentation for terminology used in the form?
+
* Working on collecting different use cases on spreadsheet
 +
* Sentence-based search is big strength of Textpresso
 +
* At latest meeting performed some large searches for OntoMate and Textpresso
 +
* Literature acquisition: still needs work
 +
** Using SVM vs. Textpresso search to find relevant papers
 +
** Species based SVM? Currently use string matching to derive different corpora
 +
** Finding genes and determining which species those genes belong to?
  
=== Zoom accounts ===
+
=== Alliance priorities? ===
* People can try to use Caltech Zoom account
+
* Transcription regulatory networks
 +
* Interactions can focus on network viewer eventually
 +
** May want different versions/flavors of interaction viewers
 +
** May also want to work closely with GO and GO-CAMs
 +
* Gene descriptions can focus on information poor genes, protein domains, etc.
  
 +
=== Sandbox visual cues ===
 +
* Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
 +
* AFP and Micropub dev sites have indicators
 +
* Could play with changing the background color? Maybe too hard to look at?
 +
* Change the color of the title of the form, e.g. the OA?
 +
* Will add red text "Development Site" at top of the OA form
  
== April 23, 2020 ==
+
=== Evidence Code Ontology ===
 +
* Kimberly and Juancarlos have worked on a parser
 +
* Will load into ACEDB soon
  
=== Community Phenotype Curation Tutorials ===
 
* Chris has finished first round of tutorials; 8 tutorials, 6 video recordings
 
* There are ~8 new volunteers; will setup tutorials for them soon
 
  
=== ECO code implementation ===
+
== June 18, 2020 ==
* ?ECO_term to replace ?GO_code in ACEDB models
 
* GAF files with three-letter codes can still be generated by mapping
 
  
=== Simplemine for Alliance ===
+
=== Undergrad phenotype submissions ===
* Wen has presented proposal to Search group
+
* Chris gave presentation to Lina Dahlberg's class about community phenotype curation
* Plan is to have a link to the Alliance Simplemine prototype from the Alliance web page
+
* Class took survey about experience with presentation and experience trying to curate worm phenotypes
 +
** Survey results: https://www.dropbox.com/s/00cit5aitv8yu27/Dahlberg_class_survey_results.xlsx?dl=0
 +
** Some students didn't benefit, but most did; nice feedback!
 +
** Lina intends to publish/micropublish the survey results so please don't share
 +
* Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)
  
=== Venn diagram tool ===
+
=== Special characters in OA/Postgres ===
* Conceived by Jae, implemented by Sibyl
+
* There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
* Currently used for interactions data
+
* In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
* Could use for other data types like phenotype (e.g. comparing RNAi vs. allele phenotype)
+
* It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
* Could also use for Expression data, e.g. comparing results from different methods
+
* Juancarlos wrote Perl script on Mangolassi at:
* Could maybe use for disease data
+
** /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
 +
** Will find bad characters and their pgids for a given Postgres table
 +
** Will find bad data and their pgids for the same table
 +
** People can query their data tables for these characters
 +
* Chris & Wen will work on compiling a list of bad characters that tend to come up
  
=== AFP tutorial ===
+
=== Citace upload ===
* Daniela, Kimberly, Valerio will run through the AFP form with Nikita from Gupta lab tomorrow
+
* July 10th citace-to-Hinxton upload
* May record in the future to make a tutorial video
+
* July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th
* Daniela may (re-)start curating markers for relevant expression patterns
 
* Wen noticed that many tissue markers are artificial (not necessarily endogenous sequence)
 
  
=== Expression markers ===
 
* SURF student projects: Identifying good expression markers? Maybe, but may require more curation experience
 
* Wen looked at expression cluster data; hard to find good, very specific (i.e. neuron) markers
 
* Daniela may (re-)start curating markers for relevant expression patterns
 
* Wen noticed that many tissue markers are artificial (not necessarily endogenous sequence)
 
* Already have an "Expression markers" widget on anatomy term pages
 
* Could combinations of genes (e.g. cGal) act as markers?
 
  
== April 30, 2020 ==
+
== June 25, 2020 ==
  
=== Adding ?ECO_term class for WS278 ===
+
=== Caltech Summer Student ===
* Proposed[https://wiki.wormbase.org/index.php/Evidence_Code_Ontology#.3FECO_term_Model ?ECO_term model]
+
* Paul has new summer student
** How are the Parent/Child and Ancestor/Descendant tags used in WB for ontology classes?  Do we still need them in .ace files?
+
** Molecular lesion curation, maybe
*Confirm proposed changes to class models that will use this tag:
+
** Are early stops more or less likely to be null mutations?
** ?GO_annotation
+
** Alleles are flagged as null in WB in the context of phenotypes
** ?Phenotype
+
** Would be good to query Postgres for null alleles and work from there
** ?Disease_model_annotation
+
* Fernando
 +
** Anatomy function
 +
** GO curation? Curating transcription factors?
 +
*** Checking for consistent curation
  
=== Ontology term models in WB ===
+
=== Worm Community Diversity Meeting ===
* Discuss using ?RO_term values in our WB ontology term models
+
* Organized by Ahna Skop and Dana Miller
* Currently relations between ontology terms are captured with text that is sometimes inconsistent for the same concept, e.g. is_a
+
* Invite posted on Facebook "C. elegans Researchers" group
* Where possible, should be use ?RO_term to express the relations between ontology terms in our WB models?
+
* Two meetings held: one Thursday (June 18th), one Friday (June 19th)
* Impact on web display?
+
* Chris attended last Friday (June 19th)
 +
* Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
 +
* One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
 +
* Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
 +
** Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
 +
* Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
 +
* Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
 +
** Most labs were real
  
===Entries in the new Genotype OA===
+
=== C_elegans Slack group ===
*21 genotype entries created in the Genotype OA required for disease curation
+
* Called "C_elegans"
*Few more to come, and at some point need to work on the dumper, in order to submit for WS278
+
* Chris made a "WormBase" channel for people to post questions, comments
*The use of the Genotype class across disease related classes waiting on approval of proposed models, will need dumper changes as well; hopefully we have enough time to get all this done for WS278
+
* Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list
  
===ZOOM for tutorial===
+
=== WormBase Outreach Webinars ===
 +
* While travel is still restricted, we should consider WormBase webinars
 +
* Scott working on a JBrowse webinar
 +
* Could have a different topic each month
 +
* Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
 +
* Should set up a schedule
 +
* How should we advertise? Can post on blog, twitter, etc.
  
* High Definition (1440p)
+
=== New transcripts expanding gene range ===
* Caltech account works well
+
* Will bring up at next week's site-wide call
 +
* Possibly due to incorporation of newer nanopore reads
 +
* Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
 +
** Example genes: pes-2.2, pck-2, herc-1, atic-1
 +
* Has several repercussions:
 +
** WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
 +
** Some expanded genes are now being attributed with thousands of alleles/variants
  
===New AFP version===
+
=== Citace upload ===
* Planned to be released in June
+
* Upload files to Spica/Wen by Tuesday (June 30th) 10am
* We need to improve datatype definitions. Please take a look at the current form here: http://textpressocentral.org:5000
+
* Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)
* Frequency of automated alerts on new submissions for specific datatypes will change monthly to weekly
 

Latest revision as of 19:13, 25 June 2020

Previous Years

2009 Meetings

2011 Meetings

2012 Meetings

2013 Meetings

2014 Meetings

2015 Meetings

2016 Meetings

2017 Meetings

2018 Meetings

2019 Meetings



2020 Meetings

January

February

March

April

May


June 4, 2020

Citace (tentative) upload

  • CIT curators upload to citace on Tuesday, July 7th, 10am Pacific
  • Citace upload to Hinxton on Friday, July 10th

Caltech reopening

  • Paul looking to get plan approved
  • People that want to come to campus need to watch training video
  • Masks available in Paul's lab
  • Can have maximum of 3 people in WormBase rooms at a time; probably best to only allow one person per WB room
    • Could possibly have 2 people in big room (Church 64) as long as they stay at least 10 feet apart
  • Need to coordinate, maybe make a Google calendar to do so (also Slack)
  • Before and after you go to campus, you need to take your temperature and assess your symptoms (if any) and submit info on form
  • Also, need to submit who you were in contact with for contact tracing
  • Form is used all week, and hold on to it until asked to be submitted
  • If someone goes in to the office, they could print several forms for people to pick up in WB offices

Nameserver

  • Nameserver was down
  • CIT curators would still like to have a single form to interact with
  • Is it possible to create objects at Caltech and let a cronjob assign IDs via the nameserver? May not be a good idea
  • Still putting genotype and all info for a strain in the reason/why field in the nameserver
  • We plan to eventually connect strains to genotypes, but need model changes and curation effort to sort out
  • Hinxton is pulling in CGC strains, how often?
  • Caltech could possibly get a block of IDs

Alliance SimpleMine

  • Any updates? 3.1 feature freeze is tomorrow
  • Pending on PI decision; Paul S. will bring it up tomorrow on the Alliance PI call


June 11, 2020

Name Service

  • Testing site now up; linked to Mangolassi
  • CGI from Juancarlos not accepting all characters, including double quotes like "
  • Example submission that fails via CGI
WBPaper000XXXX; genotype: blah::' " ` / < > [ ] { } ? , . ( ) * ^ & % $ # @ ! \ | α β Ω ≈ µ ≤ ≥ ÷ æ … ˚ ∆ ∂ ß œ ∑ † ¥ ¨ ü i î ø π “   ‘ « • – ≠ Å ´ ∏ » ± — ‚ °
  • Juancarlos will look into and try to fix

Alliance Literature group

  • Textpresso vs. OntoMate vs. PubMed
  • Still some confusion about what the different tasks can be performed in each tool
  • Working on collecting different use cases on spreadsheet
  • Sentence-based search is big strength of Textpresso
  • At latest meeting performed some large searches for OntoMate and Textpresso
  • Literature acquisition: still needs work
    • Using SVM vs. Textpresso search to find relevant papers
    • Species based SVM? Currently use string matching to derive different corpora
    • Finding genes and determining which species those genes belong to?

Alliance priorities?

  • Transcription regulatory networks
  • Interactions can focus on network viewer eventually
    • May want different versions/flavors of interaction viewers
    • May also want to work closely with GO and GO-CAMs
  • Gene descriptions can focus on information poor genes, protein domains, etc.

Sandbox visual cues

  • Juancarlos and Daniela will discuss ways to provide visual cues that a curator is on a sandbox form (on Mangolassi) vs live form (on Tazendra)
  • AFP and Micropub dev sites have indicators
  • Could play with changing the background color? Maybe too hard to look at?
  • Change the color of the title of the form, e.g. the OA?
  • Will add red text "Development Site" at top of the OA form

Evidence Code Ontology

  • Kimberly and Juancarlos have worked on a parser
  • Will load into ACEDB soon


June 18, 2020

Undergrad phenotype submissions

  • Chris gave presentation to Lina Dahlberg's class about community phenotype curation
  • Class took survey about experience with presentation and experience trying to curate worm phenotypes
  • Since April 24, the class has submitted 171 annotations from 23 papers (some redundant and some still under review)

Special characters in OA/Postgres

  • There are many special characters in free text entries in the OA; probably all from copy-pasting directly from PDF
  • In some cases it seems the special characters cause problems for downstream scripts (e.g. FTP interactions file generator)
  • It would probably be good to script the replacement of special characters with their appropriate simple characters or encoded characters
  • Juancarlos wrote Perl script on Mangolassi at:
    • /home/postgres/work/pgpopulation/grg_generegulation/20200618_summary_characters/get_summary_characters.pl
    • Will find bad characters and their pgids for a given Postgres table
    • Will find bad data and their pgids for the same table
    • People can query their data tables for these characters
  • Chris & Wen will work on compiling a list of bad characters that tend to come up

Citace upload

  • July 10th citace-to-Hinxton upload
  • July 7th citace upload, but Wen will be on vacation so will upload to Wen on Tuesday, June 30th


June 25, 2020

Caltech Summer Student

  • Paul has new summer student
    • Molecular lesion curation, maybe
    • Are early stops more or less likely to be null mutations?
    • Alleles are flagged as null in WB in the context of phenotypes
    • Would be good to query Postgres for null alleles and work from there
  • Fernando
    • Anatomy function
    • GO curation? Curating transcription factors?
      • Checking for consistent curation

Worm Community Diversity Meeting

  • Organized by Ahna Skop and Dana Miller
  • Invite posted on Facebook "C. elegans Researchers" group
  • Two meetings held: one Thursday (June 18th), one Friday (June 19th)
  • Chris attended last Friday (June 19th)
  • Worm Board looking to take input and ideas from this meeting and incorporate into meetings and events
  • One idea was to document and track outreach efforts and what people have learned from them and organize them in a central location, maybe WormBase or Worm Community Forum
  • Also, there was a suggestion to have a tool that could inform potential students of worm labs in their respective local area
    • Ask Todd; he used to have a map of researchers; Todd had asked Cecilia to curate lab location and institution
  • Person and Laboratory addresses in ACEDB have a different format, so looking to reconcile
  • Do we know how many labs are still viable? Check for a paper verified in the last 5 years, or requested strains from the CGC recently
    • Most labs were real

C_elegans Slack group

  • Called "C_elegans"
  • Chris made a "WormBase" channel for people to post questions, comments
  • Chris will look into inviting everyone and possibly integrating with help@wormbase.org email list

WormBase Outreach Webinars

  • While travel is still restricted, we should consider WormBase webinars
  • Scott working on a JBrowse webinar
  • Could have a different topic each month
  • Should collect topics to cover and assign speakers (maybe multiple speakers per topic; keep it lively)
  • Should set up a schedule
  • How should we advertise? Can post on blog, twitter, etc.

New transcripts expanding gene range

  • Will bring up at next week's site-wide call
  • Possibly due to incorporation of newer nanopore reads
  • Many genes coming in WS277 have expanded well beyond the gene limits as seen in WS276
    • Example genes: pes-2.2, pck-2, herc-1, atic-1
  • Has several repercussions:
    • WormBase does not submit alleles affecting more than one gene; with these gene expansions suddenly alleles once only affecting a single gene are now affecting two genes, and so are now omitted from loading into the Alliance (including any phenotype and/or disease annotations)
    • Some expanded genes are now being attributed with thousands of alleles/variants

Citace upload

  • Upload files to Spica/Wen by Tuesday (June 30th) 10am
  • Wen will clean up folders in Spica (older files from WS277 not cleared out for some reason)