WormBase-Caltech Weekly Calls

From WormBaseWiki
Revision as of 22:49, 12 March 2011 by Cgrove (talk | contribs)
Jump to navigationJump to search

2009 Meetings


2011 Meetings

February


March 3, 2011

SUMMARY:

Every other month patch:

  • We will try to generate an ACEDB patch every other month, starting with the Paper class data
  • Need to coordinate with Juancarlos and Todd

Need to check .ACE patch files before upload:

  • Will need to be vigilant about checking for errors and inconsistencies before sending to Todd to put up on website
  • Curators need to check their own data for problems
  • Wen will check for consistency between the different data types

Think about connecting website to Postgres:

  • Think about showing Postgres data "immediately" on site


DETAILS:

Delayed release cycle:

  • Will require more work to prepare for more frequent release of certain data types
  • Aside from Kimberly's data, most data types are not urgent (e.g. Expression pattern)
  • What are the users feeling?
    • Having data faster will help users; they don't ask, because they don't see it
  • On-the-fly updating of website? Like Postgres?
  • Since we use ACEDB, we have to patch WS with .ACE file, or rebuild whole thing
  • Flat file Postgres database, replaced every night?
  • Website calls Postgres directly for certain data types?
  • Performing build without sequence is easy? Do everything without sequence?
  • How to integrate sequence data with other data once they're decoupled through the patching process?
  • We need .ACE patch files
  • Concise description separate from most else (but connected to papers)
  • Do papers first?
  • Website can show anything
  • If we have a lot of patches, will not have check for data inconsistency/confliction
  • Trial patch .ace files for papers first
  • Juancarlos: Scripts that check differences between data dumps; scripts are data type specific
    • Curators need to talk to Juancarlos about the importance of different data tags
  • Paper .ACE file: Would include bibliographic info, journals, authors, genes associated from abstract or added manually
  • One reason for more frequent releases: because we have first pass author forms; show them we add it quickly
    • what will be added through the forms: expression patterns? RNAi (difficult?)?
  • We should check patch before we send to Todd!!! Don't want to crash database
  • How frequently to patch? Weekly? Daily? Check with Todd, how often he can load them?
  • Chron job to create patch ACE files, send to curators to check for problems, then send to Todd
  • Interdependency of data types; curators rely on other curators?
  • Postgres directly to website? Todd would have to work it out
  • New information flag on website? Toggle visibility?
  • How do we know that the data do not conflict with each other?
  • What are common problems? Dumper script goes bad, makes broken lines, empty fields
  • Error catching mechanisms? More checks on postgres? Dump files?
  • Data merging problems? What are the cases that are conflicts? Prevent them? Know beforehand?
  • If we don't know, as long as it doesn't crash the database or fail to load, then OK
  • Don't do -D stuff, maybe? No deletions? Skip typos?
  • Always have to check ACE files anyway, but have to do every week (2 weeks?)
  • We can try a patch every other month
  • What can we do without the patch?
  • Did SAB talk about changing to relational databases?
    • Get website going as is first, and see if it matters?
    • If people don't want to change data models, we can switch over to relational
    • Separate panel on website directly from Postgres?
  • Wen can check the data integration every other month for patch


March 10th, 2011

Release schedule and patches

  • What is the appropriate frequency?
  • Scheme: do what we're already doing, Wen merges into citace
  • Excluding sequence related data
  • Need to include Mary Ann's data (strains etc.)
  • Daily update too frequent; maybe once per month/week
  • Submit .ACE file to Todd with simple syntax; easily parsable; old description removed and new information added
  • Make updates only in contrast to last WS, not previous patch/temporary upload
  • ACEDB diff step only relative to WS
  • Wen: Postgres can dump diff ace file; already have diff ace files for every data type at Caltech; integrate into citace;
  • Raymond: integration is important; we need to talk about how much work needs to be done by each approach
  • Wen: consistency checks, backups, store each version?
  • Rayomnd: citace 224 to 225 (for example), display done on class level
    • Example: Gene page; only update information relevant to gene class to be displayed
  • Do once per month: faster than currently because it doesn't have to go through the dev site
  • If we update Citace to Citace, missing a lot of cross-references?
  • Mock citace with Mary Ann's data? Becomes diff base; Mary Ann submits (non-sequence related) data directly to WBCIT
  • Build low-connectivity ace at WBCIT? Add Mary Ann's data, remove RNAi
  • Todd: important consideration: things added won't be available for search until formal release; weird things about diff; new reference associations with genes; a lot of duplications?
    • Raymond: will look at it
  • Todd: WBGene00000846, example, see how fast it loads, go from there; would like a single ACE file (concatenation of all individual ace files); would not happen on development; would have to happen on production releases; would take production database off line, clone it, and upload it to all production nodes
  • Individual curators need to check their individual ace files for errors
  • Frequency: monthly
  • Todd: we should just run some tests first, to check feasibility
  • Raymond: WBGene1, example, concise description has typo, WS225 has typo fixed from WS224, diff file shows:
    • -D old_description
    • new_description
  • Load diff ace into original database?
  • parallel display; unrelated to resident WS?
  • Todd: producing two web pages for each object?
  • Raymond: No, only changing relevant tags, etc.
  • Todd: Two databases running at same time inefficient; include timestamps?
  • Raymond: No, cannot include timestamps
  • Wen: Send patch ace files to Todd
  • Raymond: In conflict with versioning; how to show new data
  • Wen: Call it "WS225.1"


Human Disease Relevance tag in Concise Description

  • Ranjana: Sent out e-mail; human disease tag "Human Disease Relevance"; to clean up concise description form (old tags in form outdated); could be putting more information into concise description; OMIM human disease
  • Raymond: make not just text field, but make entity field pointing to object; meant to be human readable, this may break up the concise description into OMIM-related and OMIM-non-related info; why parse the data into a tag?
  • Paul S: OMIM descriptions as a separate tag
  • Raymond: Rewrite concise description?
  • Ranjana: No
  • Paul S: Would you mention human disease relevance in concise description? yes, but if it's just a link out to OMIM, then separate out; OMIM may have changed since Erich wrote original script; check OMIM for new information and tags that may be able to get pulled out
  • Ranjana: Michael Paulini can consolidate orthology information?


Karen: Transgene model

  • A lot of changes to propose
  • Deletions of tags; more coming
  • Other things in database connected to transgenes
  • Many things in transgene objects that may be able to be parsed into different tags (new job for someone?)
  • Strict nomenclature for transgene descriptor
  • Clones present need to be parsed into clone class?
  • Todd made Clone page;
  • Start with vectors/backbones and then work on specific plasmids


Gene class-phenotype connections and descriptions?


GSA markup at Flybase

  • Flybase is not willing to fully QC all papers
  • Do we push Flybase and/or provide a better tool to QC?
  • Are we worried about the GSA markup for flies not looking professional?
  • People need to be willing to pay for the QC/curation; depends on database priorities
  • CIT will spend more time on in-house development to make Fly GSA markup easier/more efficient


Putting SPELL on Amazon cloud?