Discussion

From WormBaseWiki
Jump to navigationJump to search

Additional community feedback & discussion for the nGASP project:

  1. Consider also evaluating pipeline on a small genomic region of C. briggsae. Species-specific differences e.g. hexamer usage in CDS & exon/intron length may lower performance. - Michael Han
  2. Developers for each gene prediction software package may be invited to participate in an open competition, optimizing their use on nematodes. Tentative closing date is mid November 2006 - Avril
  3. Open competition closing date by end of November 2006 would get the new predictions into BrigAce and available for manual curation before St. Louis SAB meeting - Michael Han
  4. Announcement should be made via WormBase website and direct email. This competition differs from Drosophila GASP and human GENCODE since correct gene structures for elegans are publicly available already but are not to be used. Relies on participants' honesty - Avril Coghlan
  5. Choosing genes that changed between 160 and dev, or adding regions from CB1 could keep things honest - Michael Han
  6. Paul Flicek @ EBI wrote analysis software for Gencode(EGASP) project. May be available for nGASP - Avril
  7. Analysis Workshop at Hinxton/Wellcome Trust for EGASP - Tristan
  8. Should compare the test and training regions with genome-wide averages for GC content, gene length, and repeat density - Lincoln
  9. For cross-species genome alignment based gene-finders, we decided to provide MLAGAN alignments, which were provided by Michael Han. These came from the ENSEMBL COMPARA databases (incl. a core database per species, orthologies, mlagan 3-way genomic alignments using WS160 vs latest briggsae vs remanei). Michael produced these, and can reformat alignments if needed. MLAGAN alignments for the training-set and the test-set are available in CLUSTALW format. There is also a small ComparaTutorial. We had also discussed using the WABA alignments in WS160, or the UCSC three-way (elegans, briggsae, remanei) MultiZ alignment based on WS140. If WS140 differs significantly from WS160, new UCSC alignments could be generated. Furthermore, Anthony and Michael could easily produce new blastz alignments (which they have for the old briggsae non-chromosomal assembly). - Avril
  10. We could use some non-published EST data as extra held-out data for the test regions. - Avril
  11. I worry a little about inferring gene prediction accuracy in new nematodes based on an examination of 10% of C. elegans. We will not have good training sets in the new genomes, and the sequence will not be as high quality. It may be more realistic to start with a handfull of ESTs and a pseudo-assembly. - Ian Korf