Difference between revisions of "International C. elegans Meetings - UCLA"

From WormBaseWiki
Jump to navigationJump to search
Line 29: Line 29:
  
 
*First place meeting abstracts in a folder on mangolassi
 
*First place meeting abstracts in a folder on mangolassi
**For 2015 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFiles
+
**For 2019 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFiles
 
*In the postgres account, meeting abstracts are located in directories, named according to the meeting, here:  /home/postgres/work/pgpopulation/pap_papers/abstracts
 
*In the postgres account, meeting abstracts are located in directories, named according to the meeting, here:  /home/postgres/work/pgpopulation/pap_papers/abstracts
*There are two scripts that we run for processing the abstracts:
+
*There are two scripts that we run for processing the abstracts: '''Update documentation here''' wrt species and additional parsing scripts
 
**unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
 
**unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
 
***æ  represented in HTML characters as & # 230 ;  (note that these symbols are strung together in the actual code)
 
***æ  represented in HTML characters as & # 230 ;  (note that these symbols are strung together in the actual code)

Revision as of 18:14, 9 September 2019

Contact Information and File Specifications

  • Contact Anne Marie Mahoney at the Genetics Society of America. The GSA has worked with us in the past to get International Meetings into WormBase.
  • They will send us a parseable file of abstracts in which each abstract is a separate file, named according to its program number.
  • Also ask for a participants list, containing names, addresses, email addresses, and institutions, for Cecilia's records.
  • Juancarlos can parse the abstracts into mangolassi where we can proofread them before entering them into the editor on tazendra.

File Format

  • The International Meeting Abstracts are sent to us in a specific file format, with tag:value attributes that are readily parsed into the paper tables (pap_).
  • Each abstract is sent as an individual file, named according to the abstract number, e.g. 826C.txt
  • Below is the list of tag names. Authors and Institutions are listed sequentially, with increased numerical values, if needed.
    • AbstractNo :
    • Title :
    • Author 1 :
    • Presenting Author :
    • Study Group :
    • Author 1 Affiliation :
    • Institution 1 :
    • Body of Abstract :

Parsing Scripts

  • First place meeting abstracts in a folder on mangolassi
    • For 2019 abstracts: /home/acedb/kimberly/meeting_abstracts/2019_iwm_meeting_abstracts/AbsFiles_2019/AbsFiles
  • In the postgres account, meeting abstracts are located in directories, named according to the meeting, here: /home/postgres/work/pgpopulation/pap_papers/abstracts
  • There are two scripts that we run for processing the abstracts: Update documentation here wrt species and additional parsing scripts
    • unaccentHtml.pl - this script converts HTML characters and names to text wherever possible, and converts accented characters to unaccented characters, for example:
      • æ represented in HTML characters as & # 230 ; (note that these symbols are strung together in the actual code)
      • ω represented in HTML names as & omega ; (note that these symbols are strung together in the actual code)
      • This script searches the text for each of the above type of HTML coding and then we manually make sure we have a mapping for each
      • This script also strips out HTML for things like paragraph, font, etc.
    • parse.pl - this script parses the abstract information into the correpsonding paper tables
      • The parse.pl script has some values that need to be set for each specific meeting:
        • year
        • identifier prefix (e.g., wm2019)
  • The most recent versions of the scripts are here: /home/postgres/work/pgpopulation/pap_papers/abstracts/iwm2019

Proofreading

Things to check for in parsed files:

  • Does the abstract number in the identifier match the abstract number in the program?
  • Are author names correct, i.e. are there any foreign characters not translated correctly?
  • Does the text of the abstract start and stop at the right places?
  • Are symbols represented properly, e.g. 3'UTR?


Back to Paper pipeline