Difference between revisions of "Author first pass requests"

From WormBaseWiki
Jump to navigationJump to search
m
 
Line 2: Line 2:
 
   
 
   
  
Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact.
+
Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact.  
  
This doesn't work with BMC journals--><br>
+
The first line with @ stuff is here : http://textpresso-dev.caltech.edu/azurebrd/grep_output
see e-mail thread with subject line '''Re: textpresso grepped emails''' 
 
"i inspected the results of the automated corresponding author
 
identification system that Juancarlos has been using. It was 100% !
 
successful for 20 of 20 most recent papers. therefore, we can
 
conclude that the system is working pretty well, except <br>
 
as Gary pointed out, in the case of BMC Journals, there may be a
 
systematic problem. BMC Journals appear to have a consistent pattern
 
of having a "* - " right in front of the corresponding author's e-mail.  
 
perhaps we could try incorporating this in the existing script to see
 
if we can improve the outcome without major changes.<br>
 
raymond"
 
  
 +
which gets parsed into emails on tazendra here : /home/postgres/work/pgpopulation/afp_papers/textpresso_emails
 +
(the first thing is a bit junky, but that's the source)
  
Other articles whose authors will not be requested for first-passing include:  
+
These are the papers that have textpresso body :
any article with no e-mail contact information (old articles, book chapters)
+
http://textpresso-dev.caltech.edu/azurebrd/textpresso_has_body
 +
 
 +
 
 +
This doesn't work with all journals
 +
* BMC journals in which there may be a systematic problem with a consistent pattern of having a "* - " right in front of the corresponding author's e-mail.
 +
 
 +
6/20/09 The code was changed to kludge this particular problem to look for the "* -"
 +
 
 +
Continuing problems, which results in no author e-mail sent:
 +
* the email isn't correct because of tokenizing (e.g. 00032111 and 00032933 splits the sentence in the
 +
middle of the email because of a dot (BMC journals))
 +
* the PDF is a provisional PDF
 +
 
 +
 
 +
Articles whose authors will not be requested for first-passing include:  
 +
* any article with no e-mail contact information  
 +
* old articles
 +
* book chapters
  
  
  
 
[[Category:Curation]]
 
[[Category:Curation]]

Latest revision as of 19:48, 14 January 2013

back to First-pass flagging pipelines


Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact.

The first line with @ stuff is here : http://textpresso-dev.caltech.edu/azurebrd/grep_output

which gets parsed into emails on tazendra here : /home/postgres/work/pgpopulation/afp_papers/textpresso_emails (the first thing is a bit junky, but that's the source)

These are the papers that have textpresso body : http://textpresso-dev.caltech.edu/azurebrd/textpresso_has_body


This doesn't work with all journals

  • BMC journals in which there may be a systematic problem with a consistent pattern of having a "* - " right in front of the corresponding author's e-mail.

6/20/09 The code was changed to kludge this particular problem to look for the "* -"

Continuing problems, which results in no author e-mail sent:

  • the email isn't correct because of tokenizing (e.g. 00032111 and 00032933 splits the sentence in the

middle of the email because of a dot (BMC journals))

  • the PDF is a provisional PDF


Articles whose authors will not be requested for first-passing include:

  • any article with no e-mail contact information
  • old articles
  • book chapters