Difference between revisions of "Author first pass requests"
m |
|||
Line 2: | Line 2: | ||
− | Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact. | + | Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact. |
− | + | The first line with @ stuff is here : http://textpresso-dev.caltech.edu/azurebrd/grep_output | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | which gets parsed into emails on tazendra here : /home/postgres/work/pgpopulation/afp_papers/textpresso_emails | ||
+ | (the first thing is a bit junky, but that's the source) | ||
− | + | These are the papers that have textpresso body : | |
− | any article with no e-mail contact information | + | http://textpresso-dev.caltech.edu/azurebrd/textpresso_has_body |
+ | |||
+ | |||
+ | This doesn't work with all journals | ||
+ | * BMC journals in which there may be a systematic problem with a consistent pattern of having a "* - " right in front of the corresponding author's e-mail. | ||
+ | |||
+ | 6/20/09 The code was changed to kludge this particular problem to look for the "* -" | ||
+ | |||
+ | Continuing problems, which results in no author e-mail sent: | ||
+ | * the email isn't correct because of tokenizing (e.g. 00032111 and 00032933 splits the sentence in the | ||
+ | middle of the email because of a dot (BMC journals)) | ||
+ | * the PDF is a provisional PDF | ||
+ | |||
+ | |||
+ | Articles whose authors will not be requested for first-passing include: | ||
+ | * any article with no e-mail contact information | ||
+ | * old articles | ||
+ | * book chapters | ||
[[Category:Curation]] | [[Category:Curation]] |
Latest revision as of 19:48, 14 January 2013
back to First-pass flagging pipelines
Textpresso scans the paper for an e-mail address based on the '@'. The first e-mail it finds is used as the author to contact.
The first line with @ stuff is here : http://textpresso-dev.caltech.edu/azurebrd/grep_output
which gets parsed into emails on tazendra here : /home/postgres/work/pgpopulation/afp_papers/textpresso_emails (the first thing is a bit junky, but that's the source)
These are the papers that have textpresso body : http://textpresso-dev.caltech.edu/azurebrd/textpresso_has_body
This doesn't work with all journals
- BMC journals in which there may be a systematic problem with a consistent pattern of having a "* - " right in front of the corresponding author's e-mail.
6/20/09 The code was changed to kludge this particular problem to look for the "* -"
Continuing problems, which results in no author e-mail sent:
- the email isn't correct because of tokenizing (e.g. 00032111 and 00032933 splits the sentence in the
middle of the email because of a dot (BMC journals))
- the PDF is a provisional PDF
Articles whose authors will not be requested for first-passing include:
- any article with no e-mail contact information
- old articles
- book chapters