Difference between revisions of "New 2012 Curation Status"

From WormBaseWiki
Jump to navigationJump to search
 
(96 intermediate revisions by 7 users not shown)
Line 20: Line 20:
  
  
[[File:Curation_Status_Form_Main_Page2.png]]
+
[[File:Curation_Status_Form_Main_Page_11-8-2013.png|600px]]
  
  
 
Above is a screenshot of the main page of the Curation Status Form. The user/curator is requested to identify who they wish to login as, and to select one of four options to continue:
 
Above is a screenshot of the main page of the Curation Status Form. The user/curator is requested to identify who they wish to login as, and to select one of four options to continue:
  
1) [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] - This is where the curator can specify one or more specific papers they wish to view curation status results for (see below).
+
1) [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] - This is where the curator can specify one or more specific papers they wish to view curation status results for (see below). This page includes a Topic paper filter to search for papers related to a WormBase Biological Topic.
  
 
2) [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] - This is where the curator can add curation status results for one or more specific papers (see below).
 
2) [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] - This is where the curator can add curation status results for one or more specific papers (see below).
Line 40: Line 40:
  
  
[[File:Curation_Status_Form_Specific_Paper_Page2.png]]
+
[[File:Curation_Status_Form_Specific_Paper_Page_11-8-2013.png|600px]]
  
  
Above is a screenshot of the Specific Paper Page where a curator can specify which paper(s) they would like to view curation status results for. After typing/pasting in one or more WBPaper IDs in the paper entry field, the curator can specify which datatypes and flagging methods they would like to see results for. Note that selecting "all datatypes" will override any single datatype selections below. A curator can select what curation data sources they would like to see results for (i.e. Ontology Annotator and/or cur_curdata), flagging methods (SVM, AFP, CFP), the number of papers they would like to load at one time (default of 10), and whether they would like to see info (and links) for the PubMed ID (PMID), the PDF, and the paper's journal.
+
Above is a screenshot of the Specific Paper Page where a curator can specify which paper(s) they would like to view curation status results for. After typing/pasting in one or more WBPaper IDs in the paper entry field, the curator can specify which datatypes and flagging methods they would like to see results for. Note that selecting "all datatypes" will override any single datatype selections below. A curator can select what curation data sources they would like to see results for (i.e. Ontology Annotator and/or cur_curdata), flagging methods (SVM, AFP, CFP, STR[textpresso string matches]), the number of papers they would like to load at one time (default of 10), and whether they would like to see info (and links) for the PubMed ID (PMID), the PDF, and the paper's journal.
 
<br>
 
<br>
 
<br>
 
<br>
Line 49: Line 49:
  
 
Papers can be listed as WBPaper### or simply as numbers, separated by spaces, commas, pipes, new lines, anything that is not a number. Any number that is entered will be considered a valid paper ID.
 
Papers can be listed as WBPaper### or simply as numbers, separated by spaces, commas, pipes, new lines, anything that is not a number. Any number that is entered will be considered a valid paper ID.
 +
 +
'''Recent addition as of November 2013''' CG 11-6-2013
 +
A Topic dropdown menu has been added to the "Specific Paper Page" so as to allow curators to view all papers related to a particular curation topic with respect to their data type. Note that selecting a Topic will look for overlap with any WBPaper IDs entered into the main paper entry field, only populating the form with papers associated with the Topic AND in the paper entry field. Topic papers will be pulled from the Topic Curation OA. If no papers are entered in the paper entry field AND no topic is selected, the form will return ALL papers (having undergone at least one flagging pipeline).
  
 
<br>
 
<br>
Line 57: Line 60:
  
  
[[File:Curation_Status_Form_Add_Results_Page2.png]]
+
[[File:Curation_Status_Form_Add_Results_Page2.png|600px]]
  
 
Above is a screenshot of the Add Results Page of the form, where a curator can add new curation status results for one or more papers that they specify. A curator '''must''' specify what datatype they wish to submit paper results for and '''must''' specify what the status is for the paper(s): curated and (hence) positive, validated postive (but not yet curated), validated negative, or (if they need to revert back to not validated, or blank, status) not validated. The curator '''must''' then also specify at least one paper for which to apply this curation status in the paper entry field. Multiple papers '''must''' be entered as WBPaper### format and each on a separate line.
 
Above is a screenshot of the Add Results Page of the form, where a curator can add new curation status results for one or more papers that they specify. A curator '''must''' specify what datatype they wish to submit paper results for and '''must''' specify what the status is for the paper(s): curated and (hence) positive, validated postive (but not yet curated), validated negative, or (if they need to revert back to not validated, or blank, status) not validated. The curator '''must''' then also specify at least one paper for which to apply this curation status in the paper entry field. Multiple papers '''must''' be entered as WBPaper### format and each on a separate line.
Line 65: Line 68:
  
  
[[File:Curation_Status_Form_Submission_Summary2.png]]
+
[[File:Curation_Status_Form_Submission_Summary2.png|600px]]
  
  
Line 71: Line 74:
  
  
[[File:Curation_Status_Form_Confirm_Overwrite.png]]
+
[[File:Curation_Status_Form_Confirm_Overwrite.png|600px]]
  
  
 
at which point the curator can confirm the overwrite of the previous results for the indicated paper and datatype, or simply go the main page (or go back a page to make corrections/edits). Note that the fields for which data has changed are highlighted in yellow for easy viewing. If the curator confirms the overwrite by checking the confirmation check box and clicking on "Overwrite Selected Results", they will be directed to the '''Overwrite Confirmation Summary Page''':
 
at which point the curator can confirm the overwrite of the previous results for the indicated paper and datatype, or simply go the main page (or go back a page to make corrections/edits). Note that the fields for which data has changed are highlighted in yellow for easy viewing. If the curator confirms the overwrite by checking the confirmation check box and clicking on "Overwrite Selected Results", they will be directed to the '''Overwrite Confirmation Summary Page''':
  
[[File:Curation_Status_Form_Overwrite_Conf_Summary.png]]
+
[[File:Curation_Status_Form_Overwrite_Conf_Summary.png|600px]]
  
 
A link is provided to go back to the main page of the form.
 
A link is provided to go back to the main page of the form.
Line 87: Line 90:
  
  
[[File:Curation_Status_Form_Curation_Statistics_Page.png]]
+
[[File:Curation_Status_Form_Curation_Statistics_Page.png|600px]]
  
  
Above is a screenshot of a portion of the entire Curation Statistics table that a curator would be directed to from the main page of the form if they had clicked on the Curation Statistics Page button. Displayed at the top of the table are general paper statistics for a given datatype (datatypes indicated at the top of each column). Below that are statistics for papers that have been flagged (positive or negative) for the indicated datatype by ANY (at least one) flagging method. Below the "Any" statistics are the "Intersection" statistics, indicating papers flagged by ALL flagging methods for the indicated datatype. It should be emphasized here that '''"flagged"''' means processed by the flagging method, not necessarily flagged positive. Although not visible in the above screenshot, statistics for SVM results, AFP results, and CFP results are also included in this table.
+
Above is a screenshot of a portion of the entire Curation Statistics table that a curator would be directed to from the main page of the form if they had clicked on the Curation Statistics Page button. Displayed at the top of the table are general paper statistics for a given datatype (datatypes indicated at the top of each column). Below that are statistics for papers that have been flagged (positive or negative) for the indicated datatype by ANY (at least one) flagging method. Below the "Any" statistics are the "Intersection" statistics, indicating papers flagged by ALL flagging methods for the indicated datatype. It should be emphasized here that '''"flagged"''' means processed by the flagging method, not necessarily flagged positive. Although not visible in the above screenshot, statistics for SVM results, AFP results, CFP results, and STR results are also included in this table.
 
<br>
 
<br>
 
<br>
 
<br>
Line 136: Line 139:
 
Each cell number (aside from the top three rows) is also a hyperlink to the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], listing the paper IDs for each paper in the list, as well as providing options for the view of each of those papers in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]].  
 
Each cell number (aside from the top three rows) is also a hyperlink to the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], listing the paper IDs for each paper in the list, as well as providing options for the view of each of those papers in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]].  
  
 +
NB: for STR data all the curatable papers are considered flagged. So the 'any flagged' number can be misleading. CAVEAT for when pulling out statistics.
 
<br>
 
<br>
 
<br>
 
<br>
Line 153: Line 157:
  
  
[[File:Curation_Status_Form_Curation_Statistics_Options_Page.png]]
+
[[File:Curation_Status_Form_Curation_Statistics_Options_Page.png|400px]]
  
  
Line 165: Line 169:
  
  
[[File:Curation_Status_Form_Prepopulated_Specific_Papers_Page.png]]
+
[[File:Curation_Status_Form_Prepopulated_Specific_Papers_Page_11-8-2013.png|600px]]
  
  
 
Above is a screenshot of the Prepopulated Specific Papers Page. Curators are directed here from any [[New_2012_Curation_Status#Main_Curation_Statistics_Page|Curation Statistics table]] via the hyperlinked numbers in the statistics table. The entire list of paper IDs (that fit the criteria indicated in the table row/column of the statistics table) is listed as hyperlinks to the individual paper results (on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]]), as well as in the search box. Note that this page is identical to the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], except that paper IDs are already pre-populated into the form from the statistics table.
 
Above is a screenshot of the Prepopulated Specific Papers Page. Curators are directed here from any [[New_2012_Curation_Status#Main_Curation_Statistics_Page|Curation Statistics table]] via the hyperlinked numbers in the statistics table. The entire list of paper IDs (that fit the criteria indicated in the table row/column of the statistics table) is listed as hyperlinks to the individual paper results (on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]]), as well as in the search box. Note that this page is identical to the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], except that paper IDs are already pre-populated into the form from the statistics table.
 +
 +
An addition as of November 2013 is the Topic paper filtering drop down menu. The drop down menu provides a list of WormBase Biological Topics as read from the Topic Curation OA. If a topic is selected (e.g. 'Aging' in the example screenshot), the form will look for any papers that exist in BOTH the Topic paper list AND the list of papers entered into the paper entry field (in this case prepopulated from some prior filtering step). If there are no overlapping papers from both lists, the form will not return any papers. If a Topic is not selected, the form will simply return the papers as listed in the paper entry field. If no topic is selected AND there are no papers in the paper entry field, the form will return ALL papers (having undergone at least one flagging pipeline).
  
  
Line 178: Line 184:
  
  
[[File:Curation_Status_Form_Detailed_Results_of_Papers_Page.png]]
+
[[File:Curation_Status_Form_Detailed_Results_of_Papers_Page.png|800px]]
  
  
Line 195: Line 201:
 
the '''fifth column''' displays the datatype for that row (note that multiple datatypes may be displayed per paper, on separate rows);  
 
the '''fifth column''' displays the datatype for that row (note that multiple datatypes may be displayed per paper, on separate rows);  
  
the '''sixth, seventh, and eighth columns''' display the results of the flagging methods SVM, CFP, and AFP, respectively;  
+
the '''sixth, seventh, eighth, and ninth columns''' display the results of the flagging methods SVM, STR, CFP, and AFP, respectively;  
  
the '''ninth''' column indicates the status of the paper in the Ontology Annotator (OA) for that datatype indicating "oa_blank" if the paper does not exist in the respective OA, or "curated" if the paper does exist in the respective OA, indicating that it has been curated (or at least partially curated);  
+
the '''tenth''' column indicates the status of the paper in the Ontology Annotator (OA) for that datatype indicating "oa_blank" if the paper does not exist in the respective OA, or "curated" if the paper does exist in the respective OA, indicating that it has been curated (or at least partially curated); '''NOTE''' : A paper in the RNAi OA with every entry flagged with the "NO DUMP" toggle will appear as "oa_blank" in the Curation Status Form; a single row with a paper that does not have a "NO DUMP" toggle turned on will indicate a "curated" status in the Curation Status Form
  
the '''tenth column''' provides a drop-down menu to select a curator (selecting a curator is only necessary if overwriting/changing the existing curator; the form recognizes what curator is logged in and automatically populates this field with the correct (logged in) curator if this field is blank);  
+
the '''eleventh column''' provides a drop-down menu to select a curator (selecting a curator is only necessary if overwriting/changing the existing curator; the form recognizes what curator is logged in and automatically populates this field with the correct (logged in) curator if this field is blank);  
  
the '''eleventh''' column provides a drop-down menu to select the "new result" for the paper, indicating whether it is "curated and positive", "validated positive", "validated negative", or "not validated" ("not validated" only needs to be selected when reverting back from "curated and positive", "validated positive", or "validated negative" entries that may have been entered accidentally; selecting this option will result in a blank field once the change has been submitted through the form)
+
the '''twelfth''' column provides a drop-down menu to select the "new result" for the paper, indicating whether it is "curated and positive", "validated positive", "validated negative", or "not validated" ("not validated" only needs to be selected when reverting back from "curated and positive", "validated positive", or "validated negative" entries that may have been entered accidentally; selecting this option will result in a blank field once the change has been submitted through the form)
  
the '''twelfth column''' provides the drop-down menu of standard, premade comments
+
the '''thirteenth column''' provides the drop-down menu of standard, premade comments
  
the '''thirteenth (and final) column''' is a free-text area where a curator can write in any pertinent notes about the curation status of this paper-datatype pair
+
the '''fourteenth (and final) column''' is a free-text area where a curator can write in any pertinent notes about the curation status of this paper-datatype pair
  
 
If any new results for a paper are entered (in columns 10-13), the curator must click on the "Submit New Results" button at the bottom of the screen, at which point they will either be directed to the New Results Summary page or to the Overwrite Confirmation Page, as shown above in the Wiki section describing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]. Note that in order for new paper result submissions to take effect, there '''must''' be a value in the "new results" column. Otherwise any comments (premade or free-text) will not be registered with the paper.
 
If any new results for a paper are entered (in columns 10-13), the curator must click on the "Submit New Results" button at the bottom of the screen, at which point they will either be directed to the New Results Summary page or to the Overwrite Confirmation Page, as shown above in the Wiki section describing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]. Note that in order for new paper result submissions to take effect, there '''must''' be a value in the "new results" column. Otherwise any comments (premade or free-text) will not be registered with the paper.
 
  
 
== Conflicts ==
 
== Conflicts ==
Line 217: Line 222:
  
 
A paper will not be considered in conflict if the OA status indicates "oa blank" and is flagged as "curated and positive". This will be the case, for example, for all of the large scale papers whose annotations do not reside in the OA/Postgres.
 
A paper will not be considered in conflict if the OA status indicates "oa blank" and is flagged as "curated and positive". This will be the case, for example, for all of the large scale papers whose annotations do not reside in the OA/Postgres.
 +
 +
 +
== Topic-Paper Filter ==
 +
 +
(In progress... CG 11-5-2013)
 +
 +
The Curation Status Form will now provide an option to filter a list of papers based on a WormBase Biological Topic. Official topics and affiliated papers are recognized from the "Topic" OA.
  
 
<br>
 
<br>
Line 229: Line 241:
  
  
== Add Results Page: Loading Page and Processing Input ==
+
== Specific Paper Page/ Prepopulated Specific Paper Page ==
 +
 
 +
The following code prints the "[[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]]":
 +
 
 +
<pre>
 +
sub printSpecificPaperPage {
 +
  &printFormOpen();
 +
  &printHiddenCurator();
 +
  &printTextareaSpecificPapers('');
 +
  &printSelectTopics();
 +
  &printCheckboxesDatatype('off');
 +
  &printCheckboxesCurationSources('all');
 +
  &printPaperOptions();
 +
  &printSubmitGetResults();
 +
  &printFormClose();
 +
} # sub printSpecificPaperPage
 +
</pre>
  
  
The following code is responsible for printing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]:  
+
The following code prints the "[[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]]":
  
 
<pre>
 
<pre>
sub printAddResultsPage {
+
sub listCurationStatisticsPapersPage {
   &printAddSection('', '', '', '', '', '', '');
+
   &printFormOpen();
} # sub printAddResultsPage
+
  &printHiddenCurator();
 +
  my ($papers) = &printListCurationStatisticsPapers();
 +
  &printTextareaSpecificPapers($papers);
 +
  &printSelectTopics();
 +
  &printSubmitGetResults();
 +
  ($oop, my $listDatatype) = &getHtmlVar($query, "listDatatype");
 +
  &printCheckboxesDatatype($listDatatype);
 +
  &printCheckboxesCurationSources('all');
 +
  &printPaperOptions();
 +
  &printSubmitGetResults();
 +
  &printFormClose();
 +
} # sub listCurationStatisticsPapersPage
 
</pre>
 
</pre>
  
This code defines the ''printAddResultsPage'' subroutine which in turn calls upon the ''printAddSection'' subroutine (see below), passing it empty strings, declaring empty/initialized values for the curator ($twonumForm), datatype ($datatypeForm), validation result ($donposnegForm), paper list ($paperResultsForm), premade comment ($selcommentForm), and free-text comment ($txtcommentForm). Once data is submitted, these variables acquire values and are reported in the event of an error.
 
  
 +
=== Paper-Topic Filter ===
 +
 +
On both the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] and the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], we have the option to filter the papers listed in the WBPaper ID field based on their affiliation to a particular WormBase Biological Topic, as informed by the Topic Curation OA.
  
=== ''printAddSection'' Subroutine ===
 
  
The following code defines the ''printAddSection'' subroutine mentioned above, which adds the form components for entering datatype, validation status, paper IDs, premade comments, and free-text comments:
+
The following code is responsible for displaying the drop down menu of Topics:
  
 
<pre>
 
<pre>
sub printAddSection {
+
sub printSelectTopics {
  my ($twonumForm, $datatypeForm, $donposnegForm, $paperResultsForm, $selcommentForm, $txtcommentForm) = @_;
+
   print qq(Filter papers from list through a topic :<br/>);
  my $selected = '';
+
   print qq(<select name="select_topic">);
  &printFormOpen();
+
   print qq(<option value="none">no topic, use all papers from textarea above</option>\n);
  &printHiddenCurator();
+
   my %topicIDs; my %topicIdToName;
   print qq(Select your datatype :<br/>);
+
  $result = $dbh->prepare( "SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process" );
   print qq(<select name="select_datatype">);
+
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   print qq(<option value=""             ></option>\n);
+
   while (my @row = $result->fetchrow) { $topicIDs{$row[0]}++; }
   foreach my $datatype (keys %datatypes) {
+
   my $topicIDs = join"','", sort keys %topicIDs;               # for all the topicIDs, get the name from the prt_processname
    if ($datatype eq $datatypeForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
+
   $result = $dbh->prepare( "SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')" );
    print qq(<option value="$datatype" $selected>$datatype</option>\n); }
+
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   print qq(</select><br/>);
+
  while (my @row = $result->fetchrow) { $topicIdToName{$row[0]} = $row[1]; }   # map wbprocess ids to their names for dropdown display
  print qq(Select if the data is positive or negative :<br/>);
+
  foreach my $topic (sort keys %topicIdToName) { print qq(<option value="$topic $topicIdToName{$topic}">$topic $topicIdToName{$topic}</option>); }
   my $select_size = scalar keys %donPosNegOptions;
 
   print qq(<select name="select_donposneg" size="$select_size">);
 
   foreach my $donposnegValue (keys %donPosNegOptions) {
 
    if ($donposnegForm eq $donposnegValue) { $selected = qq(selected="selected"); } else { $selected = ''; }
 
    print qq(<option value="$donposnegValue" $selected>$donPosNegOptions{$donposnegValue}</option>\n); }
 
 
   print qq(</select><br/>);
 
   print qq(</select><br/>);
  print qq(Enter paper data here in the format "WBPaper00001234" (paper as a whole) with separate papers in separate lines.<br/>);
+
} # sub printSelectTopics
  print qq(<textarea name="textarea_paper_results" rows="6" cols="80">$paperResultsForm</textarea><br/>\n);
 
  print qq(Select your comment (optional) :<br/>);
 
  print qq(<select name="select_comment">);
 
  print qq(<option value=""            ></option>\n);
 
  foreach my $comment (keys %premadeComments) {
 
    if ($comment eq $selcommentForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
 
    print qq(<option value="$comment" $selected>$premadeComments{$comment}</option>\n); }
 
  print qq(</select><br/>);
 
  print qq(Enter a free text comment to associate with all papers above (optional) :<br/>);
 
  print qq(<textarea rows="4" cols="80" name="textarea_comment">$txtcommentForm</textarea><br/>);
 
  print qq(<input type="submit" name="action" value="Add Results"><br/>\n);
 
  &printFormClose();
 
} # sub printAddSection
 
 
</pre>
 
</pre>
  
 +
The above code incorporates two Postgres queries:
  
 +
First, gets the WBProcessIDs that are in the topic curation OA and have a paper and the status is 'relevant':
  
=== ''addResults'' Subroutine ===
+
<pre>
 +
SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process
 +
</pre>
  
When a curator clicks on "Add Results" on the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]], the following code will process the curator's input, catching errors when they arise:
+
Second, the WBProcess IDs go into the variable $topic, and for each of those WBProcess IDs we get the corresponding name from the process Term OA:
  
 
<pre>
 
<pre>
sub addResults {
+
SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')
  &printFormOpen();
+
</pre>
  &printHiddenCurator();
+
 
  my $errorData = '';
+
then we have a dropdown of process IDs ordered by ID, with the human readable process name next to it
  my %papersToAdd;
+
 
  my $twonum = $curator;
+
<br>
  ($oop, my $datatype) = &getHtmlVar($query, "select_datatype");
+
 
  unless ($datatype) { $errorData .= "Error : Need to select a datatype.<br/>\n"; }
+
<br>
  ($oop, my $donposneg) = &getHtmlVar($query, "select_donposneg");
+
 
  unless ($donposneg) { $errorData .= "Error : Need to select whether result is curated, validated positive, or validated negative.<br/>\n"; }
+
<br>
  ($oop, my $paperResults) = &getHtmlVar($query, "textarea_paper_results");
+
 
  if ($paperResults) {
+
== Add Results Page: Loading Page and Processing Input ==
      my @lines = split/\r\n/, $paperResults;
+
 
      foreach my $line (@lines) {
+
 
        if ($line =~ m/^WBPaper(\S+)$/) { $papersToAdd{$1}++; }
+
The following code is responsible for printing the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]]:  
        else { $errorData .= qq(Error bad line : ${line}<br/>\n); }
+
 
      } } # foreach my $line (@lines)
+
<pre>
    else { $errorData .= "Error : Need to enter at least one paper.<br/>\n"; }
+
sub printAddResultsPage {
   ($oop, my $selcomment) = &getHtmlVar($query, "select_comment");
+
   &printAddSection('', '', '', '', '', '', '');
  ($oop, my $txtcomment) = &getHtmlVar($query, "textarea_comment");
+
} # sub printAddResultsPage
  if ($errorData) {                            # problem with data, do not allow creation of any data, show form again
+
</pre>
      print "$errorData<br />\n";
+
 
      printAddSection($twonum, $datatype, $donposneg, $paperResults, $selcomment, $txtcomment); }
+
This code defines the ''printAddResultsPage'' subroutine which in turn calls upon the ''printAddSection'' subroutine (see below), passing it empty strings, declaring empty/initialized values for the curator ($twonumForm), datatype ($datatypeForm), validation result ($donposnegForm), paper list ($paperResultsForm), premade comment ($selcommentForm), and free-text comment ($txtcommentForm). Once data is submitted, these variables acquire values and are reported in the event of an error.
    else {                                      # all data is okay, enter data.
 
      my $joinkeys = join"','", sort keys %papersToAdd;
 
      my ($pgDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);
 
      my %pgData = %$pgDataRef;
 
  
      my @data; my @duplicateData;
 
      foreach my $joinkey (sort keys %papersToAdd) {
 
          my @line;
 
          push @line, $joinkey;
 
          push @line, $datatype;
 
          push @line, $twonum;
 
          push @line, $donposneg;
 
          push @line, $selcomment;
 
          push @line, $txtcomment;
 
          if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }
 
            else { push @data, \@line; }
 
      } # foreach my $joinkey (sort keys %papersToAdd)
 
      &processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);
 
    } # else # if ($errorData)
 
  &printFormClose();
 
} # sub addResults
 
</pre>
 
  
 +
=== ''printAddSection'' Subroutine ===
  
This code (above) will check to ensure the following:
+
The following code defines the ''printAddSection'' subroutine mentioned above, which adds the form components for entering datatype, validation status, paper IDs, premade comments, and free-text comments:
 
 
1) A datatype has been selected
 
 
 
2) A validation status has been entered
 
 
 
3) At least one paper ID has been submitted
 
 
 
4) There is only one paper ID per line
 
 
 
5) The paper IDs entered are in the format 'WBPaper########'
 
 
 
Any exceptions to these will result in an error message printed to the screen, in addition to reprinting the screen with the submitted values in there respective fields.
 
 
 
If no errors are found, the script will continue by calling on the ''getPgDataForJoinkeys'' subroutine (to query Postgres for the cur_curdata associated with each paper-datatype pair; see below) and writing the new input values to the appropriate paper-datatype pairs.
 
 
 
  
 
<pre>
 
<pre>
sub getPgDataForJoinkeys {
+
sub printAddSection {
   my ($joinkeys, $datatype) = @_;
+
   my ($twonumForm, $datatypeForm, $donposnegForm, $paperResultsForm, $selcommentForm, $txtcommentForm) = @_;
   my %pgData;
+
   my $selected = '';
   $result = $dbh->prepare( "SELECT * FROM cur_curdata WHERE cur_datatype = '$datatype' AND cur_paper IN ('$joinkeys')" );
+
  &printFormOpen();
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
  &printHiddenCurator();
   while (my @row = $result->fetchrow) {
+
  print qq(Select your datatype :<br/>);
     $pgData{$row[0]}{$row[1]}{curator}    = $row[2];
+
   print qq(<select name="select_datatype">);
    $pgData{$row[0]}{$row[1]}{donposneg}  = $row[3];
+
  print qq(<option value=""             ></option>\n);
     $pgData{$row[0]}{$row[1]}{selcomment}  = $row[4];
+
  foreach my $datatype (keys %datatypes) {
    $pgData{$row[0]}{$row[1]}{txtcomment}  = $row[5];
+
    if ($datatype eq $datatypeForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
     $pgData{$row[0]}{$row[1]}{timestamp}  = $row[6]; }
+
    print qq(<option value="$datatype" $selected>$datatype</option>\n); }
   return \%pgData;
+
  print qq(</select><br/>);
} # sub getPgDataForJoinkeys
+
  print qq(Select if the data is positive or negative :<br/>);
</pre>
+
  my $select_size = scalar keys %donPosNegOptions;
 
+
   print qq(<select name="select_donposneg" size="$select_size">);
 
+
  foreach my $donposnegValue (keys %donPosNegOptions) {
=== ''processResultDataDuplicateData'' Subroutine ===
+
     if ($donposnegForm eq $donposnegValue) { $selected = qq(selected="selected"); } else { $selected = ''; }
 +
     print qq(<option value="$donposnegValue" $selected>$donPosNegOptions{$donposnegValue}</option>\n); }
 +
  print qq(</select><br/>);
 +
   print qq(Enter paper data here in the format "WBPaper00001234" (paper as a whole) with separate papers in separate lines.<br/>);
 +
  print qq(<textarea name="textarea_paper_results" rows="6" cols="80">$paperResultsForm</textarea><br/>\n);
 +
  print qq(Select your comment (optional) :<br/>);
 +
  print qq(<select name="select_comment">);
 +
  print qq(<option value=""            ></option>\n);
 +
  foreach my $comment (keys %premadeComments) {
 +
     if ($comment eq $selcommentForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
 +
     print qq(<option value="$comment" $selected>$premadeComments{$comment}</option>\n); }
 +
   print qq(</select><br/>);
 +
  print qq(Enter a free text comment to associate with all papers above (optional) :<br/>);
 +
  print qq(<textarea rows="4" cols="80" name="textarea_comment">$txtcommentForm</textarea><br/>);
 +
   print qq(<input type="submit" name="action" value="Add Results"><br/>\n);
 +
  &printFormClose();
 +
} # sub printAddSection
 +
</pre>
 +
 
 +
 
  
The code will then run the ''processResultDataDuplicateData'' subroutine (see below) to print the results to the screen as the '''New Results Summary Page''' (for new data) and/or handle the overwrite confirmation when data needs to be overwritten, generating the '''Overwrite Confirmation Page''' (see [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] section above).
+
=== ''addResults'' Subroutine ===
  
 +
When a curator clicks on "Add Results" on the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]], the following code will process the curator's input, catching errors when they arise:
  
 
<pre>
 
<pre>
sub processResultDataDuplicateData {
+
sub addResults {
   my ($dataRef, $duplicateDataRef, $pgDataRef) = @_;
+
   &printFormOpen();
   my @data          = @$dataRef;
+
  &printHiddenCurator();
   my @duplicateData = @$duplicateDataRef;
+
   my $errorData = '';
   my %pgData        = %$pgDataRef;
+
   my %papersToAdd;
   print qq(<table border="1">\n);
+
   my $twonum = $curator;
   print qq(<tr>${thDot}paperId</td>${thDot}datatype</td>${thDot}curator</td>${thDot}value</td>${thDot}selcomment</td>${thDot}textcomment</td></tr>\n);
+
   ($oop, my $datatype) = &getHtmlVar($query, "select_datatype");
   foreach my $lineRef (@data) {
+
   unless ($datatype) { $errorData .= "Error : Need to select a datatype.<br/>\n"; }
    my @line = @$lineRef;
+
   ($oop, my $donposneg) = &getHtmlVar($query, "select_donposneg");
    foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
+
  unless ($donposneg) { $errorData .= "Error : Need to select whether result is curated, validated positive, or validated negative.<br/>\n"; }
    my $pgvalues = join"','", @line;
+
  ($oop, my $paperResults) = &getHtmlVar($query, "textarea_paper_results");
    my @pgcommands = ();
+
  if ($paperResults) {
    my $pgcommand = "INSERT INTO cur_curdata VALUES ('$pgvalues');";
+
      my @lines = split/\r\n/, $paperResults;
    push @pgcommands, $pgcommand;
+
      foreach my $line (@lines) {
    $pgcommand = "INSERT INTO cur_curdata_hst VALUES ('$pgvalues');";
+
        if ($line =~ m/^WBPaper(\S+)$/) { $papersToAdd{$1}++; }
    push @pgcommands, $pgcommand;
+
        else { $errorData .= qq(Error bad line : ${line}<br/>\n); }
    foreach my $pgcommand (@pgcommands) {
+
      } } # foreach my $line (@lines)
      print qq($pgcommand<br/>\n);
+
    else { $errorData .= "Error : Need to enter at least one paper.<br/>\n"; }
# UNCOMMENT TO POPULATE
+
  ($oop, my $selcomment) = &getHtmlVar($query, "select_comment");
      $dbh->do( $pgcommand );
+
  ($oop, my $txtcomment) = &getHtmlVar($query, "textarea_comment");
    }
+
   if ($errorData) {                             # problem with data, do not allow creation of any data, show form again
    my $trData = join"</td>$tdDot", @line;
+
      print "$errorData<br />\n";
    print qq(<tr>${tdDot}$trData</td></tr>\n);
+
      printAddSection($twonum, $datatype, $donposneg, $paperResults, $selcomment, $txtcomment); }
  } # foreach my $lineRef (@data)
+
    else {                                      # all data is okay, enter data.
  print qq(</table>\n);
+
      my $joinkeys = join"','", sort keys %papersToAdd;
   if (scalar @data > 0) { print "results added<br />\n"; }
+
      my ($pgDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);
</pre>
+
      my %pgData = %$pgDataRef;
  
 +
      my @data; my @duplicateData;
 +
      foreach my $joinkey (sort keys %papersToAdd) {
 +
          my @line;
 +
          push @line, $joinkey;
 +
          push @line, $datatype;
 +
          push @line, $twonum;
 +
          push @line, $donposneg;
 +
          push @line, $selcomment;
 +
          push @line, $txtcomment;
 +
          if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }
 +
            else { push @data, \@line; }
 +
      } # foreach my $joinkey (sort keys %papersToAdd)
 +
      &processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);
 +
    } # else # if ($errorData)
 +
  &printFormClose();
 +
} # sub addResults
 +
</pre>
 +
 +
 +
This code (above) will check to ensure the following:
 +
 +
1) A datatype has been selected
 +
 +
2) A validation status has been entered
 +
 +
3) At least one paper ID has been submitted
 +
 +
4) There is only one paper ID per line
 +
 +
5) The paper IDs entered are in the format 'WBPaper########'
 +
 +
Any exceptions to these will result in an error message printed to the screen, in addition to reprinting the screen with the submitted values in there respective fields.
 +
 +
If no errors are found, the script will continue by calling on the ''getPgDataForJoinkeys'' subroutine (to query Postgres for the cur_curdata associated with each paper-datatype pair; see below) and writing the new input values to the appropriate paper-datatype pairs.
 +
 +
 +
<pre>
 +
sub getPgDataForJoinkeys {
 +
  my ($joinkeys, $datatype) = @_;
 +
  my %pgData;
 +
  $result = $dbh->prepare( "SELECT * FROM cur_curdata WHERE cur_datatype = '$datatype' AND cur_paper IN ('$joinkeys')" );
 +
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
  while (my @row = $result->fetchrow) {
 +
    $pgData{$row[0]}{$row[1]}{curator}    = $row[2];
 +
    $pgData{$row[0]}{$row[1]}{donposneg}  = $row[3];
 +
    $pgData{$row[0]}{$row[1]}{selcomment}  = $row[4];
 +
    $pgData{$row[0]}{$row[1]}{txtcomment}  = $row[5];
 +
    $pgData{$row[0]}{$row[1]}{timestamp}  = $row[6]; }
 +
  return \%pgData;
 +
} # sub getPgDataForJoinkeys
 +
</pre>
 +
 +
 +
=== ''processResultDataDuplicateData'' Subroutine ===
  
The first section of the subroutine (above) processes all data submitted through the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] that is not overwriting any existing data. The new data is submitted to Postgres and a table is printed to the screen as the '''New Results Summary Page'''.
+
The code will then run the ''processResultDataDuplicateData'' subroutine (see below) to print the results to the screen as the '''New Results Summary Page''' (for new data) and/or handle the overwrite confirmation when data needs to be overwritten, generating the '''Overwrite Confirmation Page''' (see [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] section above).
 
 
The next sections of the subroutine handle new results that will overwrite existing Postgres values for the paper-datatype pairs. First, the code determines what data already exists in Postgres for a given paper-datatype pair and stores these in $xxxxxPg variables. Then, the code determines what values for curator, validation status, premade comment, and free-text comment were submitted through the form and stores these in $xxxxxFm variables. Next, the code generates and displays a table for each paper-datatype pair that has values being overwritten (if the corresponding $xxxxxPg and $xxxxxFm variables are not equal) with each set of data (old and new) highlighted in yellow to draw attention to these for overwrite confirmation. A confirmation checkbox is displayed for each paper-datatype pair undergoing an overwrite.
 
 
 
<pre>
 
  my $overwriteCount = 0;
 
  foreach my $lineRef (@duplicateData) {                # for data already in postgres, add option to overwrite
 
    my @line = @$lineRef;
 
    foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
 
    my ( $joinkey, $datatype, $twonum, $donposneg, $selcomment, $txtcomment ) = @line;
 
    my ( $curatorPg, $curatorPgName, $donposnegPg, $selcommentPg, $selcommentPgText, $txtcommentPg, $timestampPg ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;' );
 
    my ( $curatorFm, $curatorFmName, $donposnegFm, $selcommentFm, $selcommentFmText, $txtcommentFm, $timestampFm ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '<td>&nbsp;</td>' );
 
    if ( $pgData{$joinkey}{$datatype}{curator}    ) { $curatorPg    = $pgData{$joinkey}{$datatype}{curator};    $curatorPgName = $curators{$curatorPg}; }
 
    if ( $pgData{$joinkey}{$datatype}{donposneg}  ) { $donposnegPg  = $pgData{$joinkey}{$datatype}{donposneg};  }
 
    if ( $pgData{$joinkey}{$datatype}{selcomment} ) { $selcommentPg = $pgData{$joinkey}{$datatype}{selcomment}; $selcommentPgText = $premadeComments{$selcommentPg}; }
 
    if ( $pgData{$joinkey}{$datatype}{txtcomment} ) { $txtcommentPg = $pgData{$joinkey}{$datatype}{txtcomment}; }
 
    if ( $pgData{$joinkey}{$datatype}{timestamp}  ) { $timestampPg  = "<td>$pgData{$joinkey}{$datatype}{timestamp};</td>"  }
 
    if ( $twonum ) { $curatorFm = $twonum;
 
      if ( $curators{$curatorFm} ) { $curatorFmName  = $curators{$curatorFm}; } }
 
    if ( $donposneg )        { $donposnegFm = $donposneg; }
 
    if ( $selcomment ) { $selcommentFm = $selcomment;
 
      if ( $premadeComments{$selcommentFm} ) { $selcommentFmText = $premadeComments{$selcommentFm}; } }
 
    if ( $txtcomment ) { $txtcommentFm = $txtcomment; }
 
    my $isDifferent = 0;                                # if any of the non-key values has changed, show option to overwrite
 
    if ($curatorFmName    ne $curatorPgName) {
 
        $isDifferent++;
 
        $curatorFmName = '<td style="background-color:yellow">' . $curatorFmName . '</td>';
 
        $curatorPgName = '<td style="background-color:yellow">' . $curatorPgName . '</td>'; }
 
      else {
 
        $curatorFmName = '<td>' . $curatorFmName . '</td>';
 
        $curatorPgName = '<td>' . $curatorPgName . '</td>'; }
 
    if ($donposnegFm  ne $donposnegPg) {
 
        $isDifferent++;
 
        $donposnegFm = '<td style="background-color:yellow">' . $donposnegFm . '</td>';
 
        $donposnegPg = '<td style="background-color:yellow">' . $donposnegPg . '</td>'; }
 
      else {
 
        $donposnegFm = '<td>' . $donposnegFm . '</td>';
 
        $donposnegPg = '<td>' . $donposnegPg . '</td>'; }
 
    if ($selcommentFmText ne $selcommentPgText) {
 
        $isDifferent++;
 
        $selcommentFmText = '<td style="background-color:yellow">' . $selcommentFmText . '</td>';
 
        $selcommentPgText = '<td style="background-color:yellow">' . $selcommentPgText . '</td>'; }
 
      else {
 
        $selcommentFmText = '<td>' . $selcommentFmText . '</td>';
 
        $selcommentPgText = '<td>' . $selcommentPgText . '</td>'; }
 
    if ($txtcommentFm ne $txtcommentPg) {
 
        $isDifferent++;
 
        $txtcommentFm = '<td style="background-color:yellow">' . $txtcommentFm . '</td>';
 
        $txtcommentPg = '<td style="background-color:yellow">' . $txtcommentPg . '</td>'; }
 
      else {
 
        $txtcommentFm = '<td>' . $txtcommentFm . '</td>';
 
        $txtcommentPg = '<td>' . $txtcommentPg . '</td>'; }
 
    next unless ($isDifferent > 0);
 
    $overwriteCount++;
 
    print qq(<input type="hidden" name="joinkey_$overwriteCount"      value="$joinkey"  >);
 
    print qq(<input type="hidden" name="datatype_$overwriteCount"      value="$datatype" >);
 
    print qq(<input type="hidden" name="twonum_$overwriteCount"        value="$twonum"  >);
 
    print qq(<input type="hidden" name="donposneg_$overwriteCount"    value="$donposneg"  >);
 
    print qq(<input type="hidden" name="selcomment_$overwriteCount"    value="$selcomment"  >);
 
    print qq(<input type="hidden" name="txtcomment_$overwriteCount"    value="$txtcomment"  >);
 
    print qq(WBPaper$joinkey $datatype : <br/>\n);
 
    print qq(<table border="1">\n);
 
    print qq(<tr><th>&nbsp;</th><th>curator</th><th>value</th><th>selcomment</th><th>txtcomment</th><th>timestamp</th></tr>);
 
    print qq(<tr><td>old</td>${curatorPgName}${donposnegPg}${selcommentPgText}${txtcommentPg}${timestampPg}</tr>\n);
 
    print qq(<tr><td>new</td>${curatorFmName}${donposnegFm}${selcommentFmText}${txtcommentFm}${timestampFm}</tr>\n);
 
    print qq(</table>\n);
 
    print qq(Confirm change <input type="checkbox" name="checkbox_$overwriteCount" value="overwrite"><br/><br/>\n);
 
  } # foreach my $lineRef (@data)
 
  if ($overwriteCount > 0) {
 
    print qq(<input type="hidden" name="overwrite_count" value="$overwriteCount">);
 
    print qq(<input type="submit" name="action" value="Overwrite Selected Results"><br/>\n); }
 
} # sub processResultDataDuplicateData
 
</pre>
 
 
 
  
Once the curator has confirmed the overwrite of the relevant results and clicked on the "Overwrite Selected Results", the ''overwriteSelectedResults'' subroutine (below) will run to officially overwrite the data in the Postgres cur_curdata table.
 
  
 
<pre>
 
<pre>
sub overwriteSelectedResults {
+
sub processResultDataDuplicateData {
   ($oop, my $overwriteCount) = &getHtmlVar($query, "overwrite_count");
+
   my ($dataRef, $duplicateDataRef, $pgDataRef) = @_;
   my @pgcommands;
+
   my @data          = @$dataRef;
   for my $i (1 .. $overwriteCount) {
+
   my @duplicateData = @$duplicateDataRef;
    ($oop, my $overwrite) = &getHtmlVar($query, "checkbox_$i");
+
  my %pgData        = %$pgDataRef;
    next unless ($overwrite eq 'overwrite');
+
  print qq(<table border="1">\n);
    ($oop, my $joinkey    ) = &getHtmlVar($query, "joinkey_$i"    );
+
  print qq(<tr>${thDot}paperId</td>${thDot}datatype</td>${thDot}curator</td>${thDot}value</td>${thDot}selcomment</td>${thDot}textcomment</td></tr>\n);
    ($oop, my $datatype  ) = &getHtmlVar($query, "datatype_$i"   );
+
  foreach my $lineRef (@data) {
    ($oop, my $twonum    ) = &getHtmlVar($query, "twonum_$i"    );
+
     my @line = @$lineRef;
    ($oop, my $donposneg  ) = &getHtmlVar($query, "donposneg_$i"  );
+
     foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
    ($oop, my $selcomment ) = &getHtmlVar($query, "selcomment_$i" );
+
    my $pgvalues = join"','", @line;
     ($oop, my $txtcomment ) = &getHtmlVar($query, "txtcomment_$i" );
+
     my @pgcommands = ();
     unless ($donposneg) { $donposneg = ''; } unless ($selcomment) { $selcomment = ''; } unless ($txtcomment) { $txtcomment = ''; }
+
     my $pgcommand = "INSERT INTO cur_curdata VALUES ('$pgvalues');";
     push @pgcommands, qq(DELETE FROM cur_curdata WHERE cur_paper = '$joinkey' AND cur_datatype = '$datatype' AND cur_curator = '$twonum');
+
     push @pgcommands, $pgcommand;
     push @pgcommands, qq(INSERT INTO cur_curdata VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
+
    $pgcommand = "INSERT INTO cur_curdata_hst VALUES ('$pgvalues');";
     push @pgcommands, qq(INSERT INTO cur_curdata_hst VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
+
    push @pgcommands, $pgcommand;
  } # for my $i (1 .. $overwriteCount)
+
    foreach my $pgcommand (@pgcommands) {
  foreach my $pgcommand (@pgcommands) {
+
      print qq($pgcommand<br/>\n);
    print "$pgcommand<br />\n";
 
 
# UNCOMMENT TO POPULATE
 
# UNCOMMENT TO POPULATE
    $dbh->do( $pgcommand );
+
      $dbh->do( $pgcommand );
   } # foreach my $pgcommand (@pgcommands)
+
    }
} # sub overwriteSelectedResults
+
    my $trData = join"</td>$tdDot", @line;
 
+
    print qq(<tr>${tdDot}$trData</td></tr>\n);
 +
   } # foreach my $lineRef (@data)
 +
  print qq(</table>\n);
 +
  if (scalar @data > 0) { print "results added<br />\n"; }
 
</pre>
 
</pre>
  
  
 +
The first section of the subroutine (above) processes all data submitted through the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] that is not overwriting any existing data. The new data is submitted to Postgres and a table is printed to the screen as the '''New Results Summary Page'''.
  
== Detailed Results of Papers Page: Loading Data (the ''getResults'' Subroutine) ==
+
The next sections of the subroutine handle new results that will overwrite existing Postgres values for the paper-datatype pairs. First, the code determines what data already exists in Postgres for a given paper-datatype pair and stores these in $xxxxxPg variables. Then, the code determines what values for curator, validation status, premade comment, and free-text comment were submitted through the form and stores these in $xxxxxFm variables. Next, the code generates and displays a table for each paper-datatype pair that has values being overwritten (if the corresponding $xxxxxPg and $xxxxxFm variables are not equal) with each set of data (old and new) highlighted in yellow to draw attention to these for overwrite confirmation. A confirmation checkbox is displayed for each paper-datatype pair undergoing an overwrite.  
 
 
The following code is for the ''getResults'' subroutine which displays the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]] after receiving input from the curator about what to display (from a [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] or [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]] submission).
 
 
 
The first section of code collects all curatable papers, processes the checkbox input for the various datatypes, and processes the paper IDs that were submitted in the paper field.
 
  
 
<pre>
 
<pre>
sub getResults {
+
   my $overwriteCount = 0;
   &printFormOpen();
+
   foreach my $lineRef (@duplicateData) {                # for data already in postgres, add option to overwrite
   &printHiddenCurator();
+
    my @line = @$lineRef;
  &populateCuratablePapers();                  # assume for now that we only care about curatable papers
+
    foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
 
+
    my ( $joinkey, $datatype, $twonum, $donposneg, $selcomment, $txtcomment ) = @line;
  ($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");
+
     my ( $curatorPg, $curatorPgName, $donposnegPg, $selcommentPg, $selcommentPgText, $txtcommentPg, $timestampPg ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;' );
  unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }
+
    my ( $curatorFm, $curatorFmName, $donposnegFm, $selcommentFm, $selcommentFmText, $txtcommentFm, $timestampFm ) = ( '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '&nbsp;', '<td>&nbsp;</td>' );
  foreach my $datatype (keys %datatypes) {
+
     if ( $pgData{$joinkey}{$datatype}{curator}    ) { $curatorPg    = $pgData{$joinkey}{$datatype}{curator};    $curatorPgName = $curators{$curatorPg}; }
     ($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");
+
     if ( $pgData{$joinkey}{$datatype}{donposneg}  ) { $donposnegPg  = $pgData{$joinkey}{$datatype}{donposneg}; }
     unless ($chosen) { $chosen = ''; }
+
     if ( $pgData{$joinkey}{$datatype}{selcomment} ) { $selcommentPg = $pgData{$joinkey}{$datatype}{selcomment}; $selcommentPgText = $premadeComments{$selcommentPg}; }
     if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; }     # if all datatypes checkbox was selected, set that datatype's chosen to that datatype
+
     if ( $pgData{$joinkey}{$datatype}{txtcomment} ) { $txtcommentPg = $pgData{$joinkey}{$datatype}{txtcomment}; }
     print qq(<input type="hidden" name="checkbox_$datatype" value="$chosen">\n);
+
    if ( $pgData{$joinkey}{$datatype}{timestamp}  ) { $timestampPg  = "<td>$pgData{$joinkey}{$datatype}{timestamp};</td>"  }
     if ($chosen) { $chosenDatatypes{$chosen}++; }
+
    if ( $twonum ) { $curatorFm = $twonum;
  } # foreach my $datatype (keys %datatypes)
+
      if ( $curators{$curatorFm} ) { $curatorFmName  = $curators{$curatorFm}; } }
 
+
    if ( $donposneg )         { $donposnegFm = $donposneg; }
  ($oop, my $specificPapers) = &getHtmlVar($query, "specific_papers");
+
    if ( $selcomment ) { $selcommentFm = $selcomment;
  if ($specificPapers) { my (@joinkeys) = $specificPapers =~ m/(\d+)/g; foreach (@joinkeys) { $chosenPapers{$_}++; } }
+
      if ( $premadeComments{$selcommentFm} ) { $selcommentFmText = $premadeComments{$selcommentFm}; } }
     else { $chosenPapers{'all'}++; }
+
     if ( $txtcomment ) { $txtcommentFm = $txtcomment; }
  print qq(<input type="hidden" name="specific_papers" value="$specificPapers">\n);
+
    my $isDifferent = 0;                                # if any of the non-key values has changed, show option to overwrite
</pre>
+
    if ($curatorFmName    ne $curatorPgName) {
 
+
        $isDifferent++;
The next section of code populates cur_curdata for the respective paper-datatype pairs given the input from above. The code takes into account whether to show OA data and what flagging methods to display for each paper-datatype pair, to prepare them for display. Additionally, the code is now processing how many papers to display per page, setting the default page (if multiple pages of results) to see page "0", and determining whether or not the curator wishes to see the journal, PMID, and/or PDF links for each paper.
+
        $curatorFmName = '<td style="background-color:yellow">' . $curatorFmName . '</td>';
 
+
        $curatorPgName = '<td style="background-color:yellow">' . $curatorPgName . '</td>'; }
<pre>
+
      else {
  &populateCurCurData();                               # always show curator values since they have to be editable
+
        $curatorFmName = '<td>' . $curatorFmName . '</td>';
 
+
        $curatorPgName = '<td>' . $curatorPgName . '</td>'; }
  ($oop, my $displayOa) = &getHtmlVar($query, "checkbox_oa");   unless ($displayOa) { $displayOa = ''; }
+
    if ($donposnegFm  ne $donposnegPg) {
  ($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");   unless ($displayCfp) { $displayCfp = ''; }
+
        $isDifferent++;
  ($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");   unless ($displayAfp) { $displayAfp = ''; }
+
        $donposnegFm = '<td style="background-color:yellow">' . $donposnegFm . '</td>';
  ($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");   unless ($displaySvm) { $displaySvm = ''; }
+
        $donposnegPg = '<td style="background-color:yellow">' . $donposnegPg . '</td>'; }
  print qq(<input type="hidden" name="checkbox_oa" value="$displayOa" >\n);
+
      else {
  print qq(<input type="hidden" name="checkbox_cfp" value="$displayCfp">\n);
+
        $donposnegFm = '<td>' . $donposnegFm . '</td>';
  print qq(<input type="hidden" name="checkbox_afp" value="$displayAfp">\n);
+
        $donposnegPg = '<td>' . $donposnegPg . '</td>'; }
  print qq(<input type="hidden" name="checkbox_svm" value="$displaySvm">\n);
+
    if ($selcommentFmText ne $selcommentPgText) {
  if ($displayOa) {  &populateOaData();  }
+
        $isDifferent++;
  if ($displayCfp) { &populateCfpData(); }
+
        $selcommentFmText = '<td style="background-color:yellow">' . $selcommentFmText . '</td>';
  if ($displayAfp) { &populateAfpData(); }
+
        $selcommentPgText = '<td style="background-color:yellow">' . $selcommentPgText . '</td>'; }
  if ($displaySvm) { &populateSvmData(); }
+
      else {
 
+
        $selcommentFmText = '<td>' . $selcommentFmText . '</td>';
  ($oop, my $showJournal) = &getHtmlVar($query, "checkbox_journal");  unless ($showJournal) { $showJournal = ''; }
+
        $selcommentPgText = '<td>' . $selcommentPgText . '</td>'; }
  ($oop, my $showPmid)    = &getHtmlVar($query, "checkbox_pmid");     unless ($showPmid) {    $showPmid = '';    }
+
    if ($txtcommentFm ne $txtcommentPg) {
  ($oop, my $showPdf)    = &getHtmlVar($query, "checkbox_pdf");      unless ($showPdf) {    $showPdf = '';     }
+
        $isDifferent++;
  print qq(<input type="hidden" name="checkbox_journal" value="$showJournal">\n);
+
        $txtcommentFm = '<td style="background-color:yellow">' . $txtcommentFm . '</td>';
  print qq(<input type="hidden" name="checkbox_pmid"    value="$showPmid">\n);
+
        $txtcommentPg = '<td style="background-color:yellow">' . $txtcommentPg . '</td>'; }
  print qq(<input type="hidden" name="checkbox_pdf"    value="$showPdf">\n);
+
      else {
 
+
        $txtcommentFm = '<td>' . $txtcommentFm . '</td>';
  ($oop, my $papersPerPage) = &getHtmlVar($query, "papers_per_page");
+
        $txtcommentPg = '<td>' . $txtcommentPg . '</td>'; }
  ($oop, my $pageSelected)  = &getHtmlVar($query, "select_page");
+
    next unless ($isDifferent > 0);
  unless ($papersPerPage) { $papersPerPage = 10; }
+
    $overwriteCount++;
  unless ($pageSelected) {  $pageSelected  = 0; }
+
    print qq(<input type="hidden" name="joinkey_$overwriteCount"       value="$joinkey" >);
  print qq(<input type="hidden" name="papers_per_page" value="$papersPerPage">\n);
+
    print qq(<input type="hidden" name="datatype_$overwriteCount"     value="$datatype" >);
 
+
    print qq(<input type="hidden" name="twonum_$overwriteCount"       value="$twonum"   >);
   my @headerRow = qw( paperID );
+
    print qq(<input type="hidden" name="donposneg_$overwriteCount"     value="$donposneg"   >);
   if ($showJournal) { push @headerRow, "journal"; &populateJournal(); }
+
     print qq(<input type="hidden" name="selcomment_$overwriteCount"   value="$selcomment" >);
  if ($showPmid)    { push @headerRow, "pmid";    &populatePmid();   }
+
    print qq(<input type="hidden" name="txtcomment_$overwriteCount"    value="$txtcomment" >);
  if ($showPdf)    { push @headerRow, "pdf";    &populatePdf();    }
+
    print qq(WBPaper$joinkey $datatype : <br/>\n);
 +
    print qq(<table border="1">\n);
 +
    print qq(<tr><th>&nbsp;</th><th>curator</th><th>value</th><th>selcomment</th><th>txtcomment</th><th>timestamp</th></tr>);
 +
    print qq(<tr><td>old</td>${curatorPgName}${donposnegPg}${selcommentPgText}${txtcommentPg}${timestampPg}</tr>\n);
 +
    print qq(<tr><td>new</td>${curatorFmName}${donposnegFm}${selcommentFmText}${txtcommentFm}${timestampFm}</tr>\n);
 +
    print qq(</table>\n);
 +
    print qq(Confirm change <input type="checkbox" name="checkbox_$overwriteCount" value="overwrite"><br/><br/>\n);
 +
   } # foreach my $lineRef (@data)
 +
   if ($overwriteCount > 0) {
 +
    print qq(<input type="hidden" name="overwrite_count" value="$overwriteCount">);
 +
    print qq(<input type="submit" name="action" value="Overwrite Selected Results"><br/>\n); }
 +
} # sub processResultDataDuplicateData
 
</pre>
 
</pre>
  
  
The code now generates a hash (%trs) of rows of results (not necessarily in the order submitted). Any paper that has data for the relevant datatype and flagging method, this data is then loaded into the appropriate column of each row. For each paper submitted and each datatype requested, the code will load the flagging results, cur_curdata, and OA (curation) data into the ''%allPaperData'' hash table. The code then loads, for each paper queried and datatype requested, the relevant results. The first column for any paper will display the WBPaper ID#. If the PMID, journal, and/or PDF link were requested to be displayed, they are displayed in the next columns for a given paper. Next is displayed the datatype column. In the next column (if SVM data was requested for view), the SVM data for the each paper-datatype pair for the paper are populated into the table, highlighting the "high", "medium", and "low" SVM results in decreasing intensities of red highlight, respectively. The next columns are populated with CFP, AFP, and OA data (if requested) for each paper-datatype pair. Note that a blank field for CFP and AFP are simply blank ("") whereas an empty result for OA data is represented by "oa_blank". Otherwise, the CFP and AFP fields would be populated with free-text entries from the CFP and AFP results and the OA field would indicate "curated" if data was found in the OA for the paper-datatype pair. In the next columns are displayed the curator drop-down menu, the validation status drop-down menu, the premade comment drop-down menu, and the free-text comment field. Any new results submitted via this page for a paper-datatype pair will automatically be attributed to the curator that is logged in, unless their is already a curator listed in the curator field or the curator explicitly selects a curator from the curator drop-down list. In the free-text comment field, if there are more than 20 characters stored, only the first 20 characters will be displayed followed by an ellipsis ("..."). Clicking inside the free-text field will open up the full view of the text and while editing will remain in full text view. Subsequent clicking outside of the text field will revert back to the truncated, ellipsis view to conserve screen space. Each cell in the table is outlined with a dotted line format.
+
Once the curator has confirmed the overwrite of the relevant results and clicked on the "Overwrite Selected Results", the ''overwriteSelectedResults'' subroutine (below) will run to officially overwrite the data in the Postgres cur_curdata table.
 
 
  
 
<pre>
 
<pre>
   my %trs;                             # td data for each table row
+
sub overwriteSelectedResults {
   my %paperPosNegOkay;                 # papers that have positive-negative data okay, so show all svm results for that paper even if a given row isn't positive-negative okay
+
   ($oop, my $overwriteCount) = &getHtmlVar($query, "overwrite_count");
   my %paperInfo;                       # for a joinkey, all the paper information about it to show in a big rowspan for that table row
+
   my @pgcommands;
 +
  for my $i (1 .. $overwriteCount) {
 +
    ($oop, my $overwrite) = &getHtmlVar($query, "checkbox_$i");
 +
    next unless ($overwrite eq 'overwrite');
 +
    ($oop, my $joinkey    ) = &getHtmlVar($query, "joinkey_$i"    );
 +
    ($oop, my $datatype  ) = &getHtmlVar($query, "datatype_$i"   );
 +
    ($oop, my $twonum    ) = &getHtmlVar($query, "twonum_$i"    );
 +
    ($oop, my $donposneg  ) = &getHtmlVar($query, "donposneg_$i"  );
 +
    ($oop, my $selcomment ) = &getHtmlVar($query, "selcomment_$i" );
 +
    ($oop, my $txtcomment ) = &getHtmlVar($query, "txtcomment_$i" );
 +
    unless ($donposneg) { $donposneg = ''; } unless ($selcomment) { $selcomment = ''; } unless ($txtcomment) { $txtcomment = ''; }
 +
    push @pgcommands, qq(DELETE FROM cur_curdata WHERE cur_paper = '$joinkey' AND cur_datatype = '$datatype' AND cur_curator = '$twonum');
 +
    push @pgcommands, qq(INSERT INTO cur_curdata VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
 +
    push @pgcommands, qq(INSERT INTO cur_curdata_hst VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
 +
  } # for my $i (1 .. $overwriteCount)
 +
  foreach my $pgcommand (@pgcommands) {
 +
    print "$pgcommand<br />\n";
 +
# UNCOMMENT TO POPULATE
 +
    $dbh->do( $pgcommand );
 +
  } # foreach my $pgcommand (@pgcommands)
 +
} # sub overwriteSelectedResults
  
  my %allPaperData;                    # hash of datatype - joinkey  for all posible queried data structures, to key off from this when there are no svm results for a data structure with data.
+
</pre>
  foreach my $datatype (keys %svmData) { foreach my $joinkey (keys %{ $svmData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
 
  foreach my $datatype (keys %curData) { foreach my $joinkey (keys %{ $curData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
 
  foreach my $datatype (keys %oaData)  { foreach my $joinkey (keys %{  $oaData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
 
  foreach my $datatype (keys %cfpData) { foreach my $joinkey (keys %{ $cfpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
 
  foreach my $datatype (keys %afpData) { foreach my $joinkey (keys %{ $afpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
 
  
  my $trCounter = 0;
 
  foreach my $joinkey (sort keys %curatablePapers) {                    # TODO curatablePapers or allPaperData that have some flag ?
 
    next unless ($chosenPapers{$joinkey} || $chosenPapers{all});
 
  
    push @{ $paperInfo{$joinkey} }, $joinkey;
 
    my $journal = ''; my $pmid = ''; my $pdf = ''; my $primaryData = '';
 
    if ($showJournal) {
 
      if ($journal{$joinkey}) { $journal = $journal{$joinkey}; }
 
      push @{ $paperInfo{$joinkey} }, $journal; }
 
    if ($showPmid) {
 
      if ($pmid{$joinkey}) { $pmid = $pmid{$joinkey}; }
 
      push @{ $paperInfo{$joinkey} }, $pmid; }
 
    if ($showPdf) {
 
      if ($pdf{$joinkey}) { $pdf = $pdf{$joinkey}; }
 
      push @{ $paperInfo{$joinkey} }, $pdf; }
 
  
    foreach my $datatype (sort keys %{ $allPaperData{$joinkey} }) {
+
== Detailed Results of Papers Page: Loading Data (the ''getResults'' Subroutine) ==
      next unless ($chosenDatatypes{$datatype});                        # show only results for selected datatype
+
 
      my @dataRow = ( "$datatype" );
+
The following code is for the ''getResults'' subroutine which displays the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]] after receiving input from the curator about what to display (from a [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]] or [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]] submission).
      $trCounter++;
 
      if ($displaySvm) {
 
        my $svmResult = '';
 
        if ($svmData{$datatype}{$joinkey}) { $svmResult = $svmData{$datatype}{$joinkey}; }
 
        my $bgcolor = 'white';
 
        if ($svmResult eq 'high')     { $bgcolor = '#FFA0A0'; }
 
        elsif ($svmResult eq 'medium') { $bgcolor = '#FFC8C8'; }
 
        elsif ($svmResult eq 'low')    { $bgcolor = '#FFE0E0'; }
 
        $svmResult = qq(<span style="background-color: $bgcolor">$svmResult</span>);
 
        push @dataRow, $svmResult;
 
      } # if ($displaySvm)
 
  
      if ($displayCfp) {
+
The first section of code collects all curatable papers, processes the checkbox input for the various datatypes, and processes the paper IDs that were submitted in the paper field.
        my $cfpResult = '';
 
        if ($cfpData{$datatype}{$joinkey}) { $cfpResult = $cfpData{$datatype}{$joinkey}; }
 
        push @dataRow, $cfpResult;
 
      }
 
  
      if ($displayAfp) {
+
<pre>
        my $afpResult = '';
+
sub getResults {
        if ($afpData{$datatype}{$joinkey}) { $afpResult = $afpData{$datatype}{$joinkey}; }
+
  &printFormOpen();
        push @dataRow, $afpResult;
+
  &printHiddenCurator();
      }
+
  &populateCuratablePapers();                   # assume for now that we only care about curatable papers
  
      if ($displayOa) {
+
  ($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");
        my $oaResult = 'oa_blank';
+
  unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }
        if ($oaData{$datatype}{$joinkey}) { $oaResult = $oaData{$datatype}{$joinkey}; }
+
  foreach my $datatype (keys %datatypes) {
        push @dataRow, $oaResult;
+
    ($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");
      }
+
    unless ($chosen) { $chosen = ''; }
 +
    if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; }     # if all datatypes checkbox was selected, set that datatype's chosen to that datatype
 +
    print qq(<input type="hidden" name="checkbox_$datatype" value="$chosen">\n);
 +
    if ($chosen) { $chosenDatatypes{$chosen}++; }
 +
  } # foreach my $datatype (keys %datatypes)
  
      my $thisCurator = '';                                                     # curator in cur_curdata for this paper-datatype if it has a value
+
  ($oop, my $specificPapers) = &getHtmlVar($query, "specific_papers");
      if ( $curData{$datatype}{$joinkey}{curator} ) { $thisCurator = $curData{$datatype}{$joinkey}{curator}; }
+
  my %filterPapers; my %specificPapers; my %topicPapers;
      my $curatorSelectCurator = qq(<select name="select_curator_curator_$trCounter" size="1">\n<option value=""></option>\n);
+
  if ($specificPapers) { my (@joinkeys) = $specificPapers =~ m/(\d+)/g; foreach (@joinkeys) { $specificPapers{$_}++; } }
      foreach my $curator_two (keys %curators) {       # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
+
  ($oop, my $topic)         = &getHtmlVar($query, "select_topic");    # if there's a selected topic replace specific papers with those from topic
        if ($thisCurator eq $curator_two) { $curatorSelectCurator .= qq(<option value="$curator_two" selected="selected">$curators{$curator_two}</option>\n); }
+
  unless ($topic) { $topic = 'none'; }
          else {                           $curatorSelectCurator .= qq(<option value="$curator_two">$curators{$curator_two}</option>\n); } }
+
  if ($topic ne 'none') {
      $curatorSelectCurator .= qq(</select>);
+
    print "using topic $topic<br/>\n";
 +
    my ($topicID) = $topic =~ m/(WBbiopr:\d+)/;                        # get the WBProcessID from the topic which includes the name
 +
    print qq(<input type="hidden" name="select_topic" value="$topic">\n);
 +
    $result = $dbh->prepare( "SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') AND pro_process.pro_process = '$topicID'" );
 +
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
    while (my @row = $result->fetchrow) { $row[0] =~ s/WBPaper//; $topicPapers{$row[0]}++; }
 +
  } # if ($topic ne 'none')
 +
  if ($specificPapers && ($topic ne 'none')) {
 +
      foreach (sort keys %specificPapers) { if ($topicPapers{$_}) { $chosenPapers{$_}++; } } }
 +
    elsif ($specificPapers) {
 +
      foreach (sort keys %specificPapers) { $chosenPapers{$_}++; } }
 +
    elsif ($topic ne 'none') {
 +
      foreach (sort keys %topicPapers) { $chosenPapers{$_}++; } }
 +
    else { $chosenPapers{'all'}++; } 
 +
  print qq(<input type="hidden" name="specific_papers" value="$specificPapers">\n);
 +
</pre>
 +
 
 +
The above code looks at the Topic Curation OA for rows where a paper matches the selected topic, the status is 'relevant', and gets the associated WBPapers
  
      $curatorSelectCurator .= qq(<input type="hidden" name="joinkey_$trCounter"  value="$joinkey" >);  # these are required, arbitrarily added here
+
How filtering works:
      $curatorSelectCurator .= qq(<input type="hidden" name="datatype_$trCounter" value="$datatype">);  # these are required, arbitrarily added here
+
There are two lists of papers:
      push @dataRow, $curatorSelectCurator;
+
(1) papers in the text area box, (2) papers for the topic (from the Topic Curation OA)
 +
The filtering code looks for papers that exist in both lists and generates results to display for the filtered list of papers. If no papers are entered in the text area, the resulting list of papers will be whatever papers are affiliated with the Topic. If there are no papers affiliated with the topic, the resulting list will be whatever papers were entered into the paper text area. If there are no papers in either the text area or affiliated with the Topic, it will return all papers relevant to this form.
  
      my $thisDonPosNeg = ''; if ( $curData{$datatype}{$joinkey}{donposneg} ) { $thisDonPosNeg = $curData{$datatype}{$joinkey}{donposneg}; }
 
      my $curatorSelectDonposneg = qq(<select name="select_curator_donposneg_$trCounter">);
 
      foreach my $donposneg (keys %donPosNegOptions) {        # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
 
        if ($thisDonPosNeg eq $donposneg) { $curatorSelectDonposneg .= qq(<option value="$donposneg" selected="selected">$donPosNegOptions{$donposneg}</option>\n); }
 
          else {                            $curatorSelectDonposneg .= qq(<option value="$donposneg"                    >$donPosNegOptions{$donposneg}</option>\n); } }
 
      $curatorSelectDonposneg .= qq(</select>);
 
      push @dataRow, $curatorSelectDonposneg;
 
  
      my $thisSelComment = ''; if ( $curData{$datatype}{$joinkey}{selcomment} ) { $thisSelComment = $curData{$datatype}{$joinkey}{selcomment}; }
+
The next section of code populates cur_curdata for the respective paper-datatype pairs given the input from above. The code takes into account whether to show OA data and what flagging methods to display for each paper-datatype pair, to prepare them for display. Additionally, the code is now processing how many papers to display per page, setting the default page (if multiple pages of results) to see page "0", and determining whether or not the curator wishes to see the journal, PMID, and/or PDF links for each paper.
      my $curatorSelectComment = qq(<select name="select_curator_comment_$trCounter">);
 
      $curatorSelectComment .= qq(<option value=""            ></option>\n);
 
      foreach my $comment (keys %premadeComments) {
 
        if ($thisSelComment eq $comment) { $curatorSelectComment .= qq(<option value="$comment" selected="selected">$premadeComments{$comment}</option>\n); }
 
          else {                          $curatorSelectComment .= qq(<option value="$comment"                    >$premadeComments{$comment}</option>\n); } }
 
      $curatorSelectComment .= qq(</select>);
 
      push @dataRow, $curatorSelectComment;
 
  
      my $txtcomment = ''; if ( $curData{$datatype}{$joinkey}{txtcomment} ) { $txtcomment = $curData{$datatype}{$joinkey}{txtcomment}; }
+
<pre>
      my $shortTxtComment = $txtcomment;  unless ($shortTxtComment) { $shortTxtComment = '&nbsp;'; }
+
  &populateCurCurData();                               # always show curator values since they have to be editable
      if ($txtcomment =~ m/^(.{20})/) { $shortTxtComment = $1; $shortTxtComment .= '...'; }
 
      my $curatorTextareaComment = qq(<div id="div_curator_comment_$trCounter" onclick="document.getElementById('div_curator_comment_$trCounter').style.display = 'none'; document.getElementById('textarea_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').focus();" >$shortTxtComment</div>\n);
 
      $curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; var divValue = document.getElementById('textarea_curator_comment_$trCounter').value; if (divValue === '') { divValue = '&nbsp;'; } document.getElementById('div_curator_comment_$trCounter').innerHTML = divValue; ">$txtcomment</textarea>\n);
 
#      $curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; document.getElementById('div_curator_comment_$trCounter').innerHTML = document.getElementById('textarea_curator_comment_$trCounter').value.substring(0,20)">$txtcomment</textarea>\n);                # to get the first 20 characters without adding ...
 
      push @dataRow, $curatorTextareaComment;
 
  
      $paperPosNegOkay{$joinkey}++;                             # all papers always okay for pos/neg since we no longer have pos/neg filtering 2012 11 08
+
  ($oop, my $displayOa)  = &getHtmlVar($query, "checkbox_oa");    unless ($displayOa) {  $displayOa  = ''; }
 +
  ($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");  unless ($displayCfp) { $displayCfp = ''; }
 +
  ($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");  unless ($displayAfp) { $displayAfp = ''; }
 +
  ($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");   unless ($displaySvm) { $displaySvm = ''; }
 +
  ($oop, my $displayStr) = &getHtmlVar($query, "checkbox_str");  unless ($displayStr) { $displayStr = ''; }
 +
  print qq(<input type="hidden" name="checkbox_oa" value="$displayOa" >\n);
 +
  print qq(<input type="hidden" name="checkbox_cfp" value="$displayCfp">\n);
 +
  print qq(<input type="hidden" name="checkbox_afp" value="$displayAfp">\n);
 +
  print qq(<input type="hidden" name="checkbox_svm" value="$displaySvm">\n);
 +
  print qq(<input type="hidden" name="checkbox_str" value="$displayStr">\n);
 +
  if ($displayOa) {  &populateOaData();  }
 +
  if ($displayCfp) { &populateCfpData(); }
 +
  if ($displayAfp) { &populateAfpData(); }
 +
  if ($displaySvm) { &populateSvmData(); }
 +
  if ($displayStr) { &populateStrData(); }
  
      my $trData = join"</td>$tdDot", @dataRow;
+
  ($oop, my $showJournal) = &getHtmlVar($query, "checkbox_journal");  unless ($showJournal) { $showJournal = ''; }
      push @{ $trs{$joinkey} }, qq(${tdDot}$trData</td></tr>\n);
+
  ($oop, my $showPmid)    = &getHtmlVar($query, "checkbox_pmid");     unless ($showPmid) {   $showPmid = '';    }
    } # foreach my $datatype (sort keys %{ $allPaperData{$joinkey} })
+
  ($oop, my $showPdf)    = &getHtmlVar($query, "checkbox_pdf");      unless ($showPdf) {     $showPdf = '';    }
   } # foreach my $joinkey (sort keys %allPaperData)
+
  print qq(<input type="hidden" name="checkbox_journal" value="$showJournal">\n);
</pre>
+
  print qq(<input type="hidden" name="checkbox_pmid"    value="$showPmid">\n);
 
+
  print qq(<input type="hidden" name="checkbox_pdf"    value="$showPdf">\n);
 
+
 
The following code will print the rows of the table, displaying the requested information for each paper and paper-datatype pair. The code collects all relevant, valid papers and then calculates the number of pages that the results will be distributed across. The number of pages is calculated by dividing the total number of papers resulting from the query and dividing by the number of papers per page requested, and rounding up to an integer value. A drop-down menu is then generated and displayed, providing the curator with a means to access the paper results not available from the current page. A "Get Results" button is displayed which allows the curator to request a new page of results once a page has been selected from the page number drop-down menu. The form displays the total number of papers in the query result. The HTML table is then printed, adding column headers. Columns and their respective headers for cur_curdata (paper ID, curator, validation status, premade comment, and free-text comment) are always displayed, whereas all other columns and headers are optional. Cell borders are displayed with dotted lines.
+
  ($oop, my $papersPerPage) = &getHtmlVar($query, "papers_per_page");
 
+
  ($oop, my $pageSelected)  = &getHtmlVar($query, "select_page");
To determine which papers to display on a page, the code determines what page number (n) is currently to be viewed and skips the first (n-1)*(papers per page) papers. The following paper and subsequent papers, up to the number requested per page, are then displayed. The code is written so that for each paper, the paper ID, PMID, journal, and PDF link are all only displayed once per paper, regardless of how many datatypes are being viewed. A "Submit New Results" button is then printed at the bottom of the complete table and the form is closed.
+
  unless ($papersPerPage) { $papersPerPage = 10; }
 +
  unless ($pageSelected) { $pageSelected  = 0;  }
 +
  print qq(<input type="hidden" name="papers_per_page" value="$papersPerPage">\n);
 +
 
 +
  my @headerRow = qw( paperID );
 +
   if ($showJournal) { push @headerRow, "journal"; &populateJournal(); }
 +
  if ($showPmid)    { push @headerRow, "pmid";    &populatePmid();    }
 +
  if ($showPdf)    { push @headerRow, "pdf";    &populatePdf();    }
 +
</pre>
 +
 
 +
 
 +
The code now generates a hash (%trs) of rows of results (not necessarily in the order submitted). Any paper that has data for the relevant datatype and flagging method, this data is then loaded into the appropriate column of each row. For each paper submitted and each datatype requested, the code will load the flagging results, cur_curdata, and OA (curation) data into the ''%allPaperData'' hash table. The code then loads, for each paper queried and datatype requested, the relevant results. The first column for any paper will display the WBPaper ID#. If the PMID, journal, and/or PDF link were requested to be displayed, they are displayed in the next columns for a given paper. Next is displayed the datatype column. In the next column (if SVM data was requested for view), the SVM data for the each paper-datatype pair for the paper are populated into the table, highlighting the "high", "medium", and "low" SVM results in decreasing intensities of red highlight, respectively. The next columns are populated with STR, CFP, AFP, and OA data (if requested) for each paper-datatype pair. Note that a blank field for STR, CFP and AFP are simply blank ("") whereas an empty result for OA data is represented by "oa_blank". Otherwise, the STR, CFP and AFP fields would be populated with free-text entries from the STR, CFP and AFP results and the OA field would indicate "curated" if data was found in the OA for the paper-datatype pair. In the next columns are displayed the curator drop-down menu, the validation status drop-down menu, the premade comment drop-down menu, and the free-text comment field. Any new results submitted via this page for a paper-datatype pair will automatically be attributed to the curator that is logged in, unless their is already a curator listed in the curator field or the curator explicitly selects a curator from the curator drop-down list. In the free-text comment field, if there are more than 20 characters stored, only the first 20 characters will be displayed followed by an ellipsis ("..."). Clicking inside the free-text field will open up the full view of the text and while editing will remain in full text view. Subsequent clicking outside of the text field will revert back to the truncated, ellipsis view to conserve screen space. Each cell in the table is outlined with a dotted line format.
  
  
 
<pre>
 
<pre>
   print qq(<input type="hidden" name="trCounter" value="$trCounter">);
+
   my %trs;                             # td data for each table row
 +
  my %paperPosNegOkay;                  # papers that have positive-negative data okay, so show all svm results for that paper even if a given row isn't positive-negative okay
 +
  my %paperInfo;                        # for a joinkey, all the paper information about it to show in a big rowspan for that table row
  
   my $joinkeysAmount = scalar(keys %paperPosNegOkay);
+
   my %allPaperData;                    # hash of datatype - joinkey  for all posible queried data structures, to key off from this when there are no svm results for a data structure with data.
  my $pagesAmount = ceil($joinkeysAmount / $papersPerPage);
+
  foreach my $datatype (keys %svmData) { foreach my $joinkey (keys %{ $svmData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
   print qq(Page number <select name="select_page">);
+
   foreach my $datatype (keys %strData) { foreach my $joinkey (keys %{ $strData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
   for my $i (1 .. $pagesAmount) {
+
   foreach my $datatype (keys %curData) { foreach my $joinkey (keys %{ $curData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
    if ($i == $pageSelected) { print qq(<option selected="selected">$i</option>\n); }
+
   foreach my $datatype (keys %oaData)  { foreach my $joinkey (keys %{  $oaData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
      else { print qq(<option>$i</option>\n); }
+
   foreach my $datatype (keys %cfpData) { foreach my $joinkey (keys %{ $cfpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
   } # for my $i (1 .. $pagesAmount)
+
   foreach my $datatype (keys %afpData) { foreach my $joinkey (keys %{ $afpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  print qq(</select>);
 
   print qq(<input type="submit" name="action" value="Get Results">\n);
 
  print qq(amount of papers $joinkeysAmount<br/>\n);
 
   print qq(<br />\n);
 
  
   print qq(<table border="1">\n);
+
   my $trCounter = 0;
   push @headerRow, "datatype";
+
   foreach my $joinkey (sort keys %curatablePapers) {                   # TODO curatablePapers or allPaperData that have some flag ?
  if ($displaySvm)   { push @headerRow, "SVM Prediction"; }
+
    next unless ($chosenPapers{$joinkey} || $chosenPapers{all});
  if ($displayCfp)    { push @headerRow, "cfp value"; }
 
  if ($displayAfp)    { push @headerRow, "afp value"; }
 
  if ($displayOa)    { push @headerRow, "oa value";  }
 
  push @headerRow, "curator"; push @headerRow, "new result"; push @headerRow, "select comment"; push @headerRow, "textarea comment";
 
  my $headerRow = join"</th>$thDot", @headerRow;
 
  $headerRow = qq(<tr>$thDot) . $headerRow . qq(</th></tr>);
 
  print qq($headerRow\n);
 
  
  my $papCount = 0;
+
    push @{ $paperInfo{$joinkey} }, $joinkey;
  my $papCountToSkip = 0; my $papToSkip = ($pageSelected - 1 ) * $papersPerPage;
+
    my $journal = ''; my $pmid = ''; my $pdf = ''; my $primaryData = '';
  foreach my $joinkey (sort keys %paperPosNegOkay) {                    # from all papers that have good positve-negative values, show all TRs
+
     if ($showJournal) {
     $papCountToSkip++; next if ($papCountToSkip <= $papToSkip);        # skip entries until at the proper page
+
      if ($journal{$joinkey}) { $journal = $journal{$joinkey}; }
    $papCount++;
+
      push @{ $paperInfo{$joinkey} }, $journal; }
    last if ($papCount > $papersPerPage);
+
     if ($showPmid) {
    my $trsInPaperAmount = scalar @{ $trs{$joinkey} };                  # amount of rows for a joinkey, make that the rowspan
+
      if ($pmid{$joinkey}) { $pmid = $pmid{$joinkey}; }
    my $firstTr = shift @{ $trs{$joinkey} };                           # the first table row needs the paper info and rowspan
+
      push @{ $paperInfo{$joinkey} }, $pmid; }
     my $tdMultiRow = $tdDot; $tdMultiRow =~ s/>$/ rowspan="$trsInPaperAmount">/;       # add the rowspan to the td style
+
     if ($showPdf) {
    my $paperInfoTds = join"</td>$tdMultiRow", @{ $paperInfo{$joinkey} };               # make paper info tds from %paperInfo
+
      if ($pdf{$joinkey}) { $pdf = $pdf{$joinkey}; }
     print qq(<tr>${tdMultiRow}$paperInfoTds</td>$firstTr\n);           # print the first row which has paper info
+
      push @{ $paperInfo{$joinkey} }, $pdf; }
    foreach my $tr (@{ $trs{$joinkey} }) { print qq(<tr>$tr\n); } }    # print other table rows without paper info
 
  print qq(</table>\n);
 
  
  print qq(<input type="submit" name="action" value="Submit New Results"><br/>\n);
+
    foreach my $datatype (sort keys %{ $allPaperData{$joinkey} }) {
 
+
      next unless ($chosenDatatypes{$datatype});                        # show only results for selected datatype
  &printFormClose();
+
      my @dataRow = ( "$datatype" );
} # sub getResults
+
      $trCounter++;
 
+
      if ($displaySvm) {
</pre>
+
        my $svmResult = '';
 +
        if ($svmData{$datatype}{$joinkey}) { $svmResult = $svmData{$datatype}{$joinkey}; }
 +
        my $bgcolor = 'white';
 +
        if ($svmResult eq 'high')      { $bgcolor = '#FFA0A0'; }
 +
        elsif ($svmResult eq 'medium') { $bgcolor = '#FFC8C8'; }
 +
        elsif ($svmResult eq 'low')    { $bgcolor = '#FFE0E0'; }
 +
        $svmResult = qq(<span style="background-color: $bgcolor">$svmResult</span>);
 +
        push @dataRow, $svmResult;
 +
      } # if ($displaySvm)
 +
     
 +
      if ($displayStr) {
 +
        my $strResult = '';
 +
        if ($strData{$datatype}{$joinkey}) { $strResult = $strData{$datatype}{$joinkey}; }
 +
        push @dataRow, $strResult;
 +
      }
  
<br>
+
      if ($displayCfp) {
<br>
+
        my $cfpResult = '';
 +
        if ($cfpData{$datatype}{$joinkey}) { $cfpResult = $cfpData{$datatype}{$joinkey}; }
 +
        push @dataRow, $cfpResult;
 +
      }
  
== Detailed Results of Papers Page: Processing Input (the ''submitNewResults'' Subroutine) ==
+
      if ($displayAfp) {
 +
        my $afpResult = '';
 +
        if ($afpData{$datatype}{$joinkey}) { $afpResult = $afpData{$datatype}{$joinkey}; }
 +
        push @dataRow, $afpResult;
 +
      }
  
When a curator clicks on the "Submit New Results" button at the bottom of the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], the ''submitNewResults'' subroutine (below) is run. First, the subroutine opens the form and prints the hidden curator, capturing the curator currently logged in. The code then looks at all paper-datatype pair data loaded by the ''getResults'' subroutine which generated the data to display on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], skipping over papers that do not have a "new result" (validation status) entry. For papers that have a "new result" entry, if there is a curator already listed in the curator field, that curator is kept as the curator for the given paper-datatype pair. If no curator is present in the field at the time of submission, the curator logged in is entered as the curator for the new data. Alternatively, if a curator has been manually selected from the curator drop-down menu, this selected curator is entered. The data from each row of the form with a "new result" entry is then compared to its respective data in Postgres. If the data is different it is prepared for display in the '''New Results Summary Page''' or the '''Overwrite Confirmation Page''' as described in the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] and the [[New_2012_Curation_Status#Add_Results_Page:_Loading_Page_and_Processing_Input|Add Results Page: Loading Page and Processing Input]] sections. Once complete, the form is closed.
+
      if ($displayOa) {
 +
        my $oaResult = 'oa_blank';
 +
        if ($oaData{$datatype}{$joinkey}) { $oaResult = $oaData{$datatype}{$joinkey}; }
 +
        push @dataRow, $oaResult;
 +
      }
  
<pre>
+
      my $thisCurator = '';                                                     # curator in cur_curdata for this paper-datatype if it has a value
sub submitNewResults {
+
      if ( $curData{$datatype}{$joinkey}{curator} ) { $thisCurator = $curData{$datatype}{$joinkey}{curator}; }
  &printFormOpen();
+
      my $curatorSelectCurator = qq(<select name="select_curator_curator_$trCounter" size="1">\n<option value=""></option>\n);
  &printHiddenCurator();
+
      foreach my $curator_two (keys %curators) {       # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
  ($oop, my $trAmount) = &getHtmlVar($query, "trCounter");
+
        if ($thisCurator eq $curator_two) { $curatorSelectCurator .= qq(<option value="$curator_two" selected="selected">$curators{$curator_two}</option>\n); }
  my %papersToAdd;
+
          else {                            $curatorSelectCurator .= qq(<option value="$curator_two">$curators{$curator_two}</option>\n); } }
  my %curatorData;
+
      $curatorSelectCurator .= qq(</select>);
  for my $i (1 .. $trAmount) {
+
 
    ($oop, my $curatorDonposneg) = &getHtmlVar($query, "select_curator_donposneg_$i");
+
      $curatorSelectCurator .= qq(<input type="hidden" name="joinkey_$trCounter"  value="$joinkey" >); # these are required, arbitrarily added here
    next unless $curatorDonposneg;                     # skip entries without a curator result for done / positive / negative
+
      $curatorSelectCurator .= qq(<input type="hidden" name="datatype_$trCounter" value="$datatype">);  # these are required, arbitrarily added here
    ($oop, my $dropdownCurator)  = &getHtmlVar($query, "select_curator_curator_$i");
+
      push @dataRow, $curatorSelectCurator;
    my $activeCurator = $curator; if ($dropdownCurator) { $activeCurator = $dropdownCurator; }  # if a curator was chosen use that, otherwise use logged in curator
 
    ($oop, my $curatorSelComment) = &getHtmlVar($query, "select_curator_comment_$i");
 
    ($oop, my $curatorTxtComment) = &getHtmlVar($query, "textarea_curator_comment_$i");
 
    ($oop, my $joinkey)          = &getHtmlVar($query, "joinkey_$i");
 
    ($oop, my $datatype)          = &getHtmlVar($query, "datatype_$i");
 
  
    $papersToAdd{$datatype}{$joinkey}++;
+
      my $thisDonPosNeg = ''; if ( $curData{$datatype}{$joinkey}{donposneg} ) { $thisDonPosNeg = $curData{$datatype}{$joinkey}{donposneg}; }
    $curatorData{$joinkey}{$datatype}{curator}    = $activeCurator;
+
      my $curatorSelectDonposneg = qq(<select name="select_curator_donposneg_$trCounter">);
    $curatorData{$joinkey}{$datatype}{donposneg}  = $curatorDonposneg;
+
      foreach my $donposneg (keys %donPosNegOptions) {       # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
    $curatorData{$joinkey}{$datatype}{selcomment} = $curatorSelComment;
+
        if ($thisDonPosNeg eq $donposneg) { $curatorSelectDonposneg .= qq(<option value="$donposneg" selected="selected">$donPosNegOptions{$donposneg}</option>\n); }
    $curatorData{$joinkey}{$datatype}{txtcomment} = $curatorTxtComment;
+
           else {                            $curatorSelectDonposneg .= qq(<option value="$donposneg"                    >$donPosNegOptions{$donposneg}</option>\n); } }
  } # for my $i (1 .. $trAmount)
+
      $curatorSelectDonposneg .= qq(</select>);
  my %pgData;
+
      push @dataRow, $curatorSelectDonposneg;
  foreach my $datatype (sort keys %papersToAdd) {
 
    my $joinkeys = join"','", sort keys %{ $papersToAdd{$datatype} };
 
    my ($pgDatatypeDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);
 
    my %pgDatatypeData = %$pgDatatypeDataRef;
 
    foreach my $joinkey (keys %pgDatatypeData) {
 
      foreach my $datatype (keys %{ $pgDatatypeData{$joinkey} }) {
 
           foreach my $valuetype (keys %{ $pgDatatypeData{$joinkey}{$datatype} }) {
 
            $pgData{$joinkey}{$datatype}{$valuetype} = $pgDatatypeData{$joinkey}{$datatype}{$valuetype}; } } } }
 
  
  my @data; my @duplicateData;
+
      my $thisSelComment = ''; if ( $curData{$datatype}{$joinkey}{selcomment} ) { $thisSelComment = $curData{$datatype}{$joinkey}{selcomment}; }
  foreach my $joinkey (sort keys %curatorData) {
+
      my $curatorSelectComment = qq(<select name="select_curator_comment_$trCounter">);
    foreach my $datatype (keys %{ $curatorData{$joinkey} }) {
+
      $curatorSelectComment .= qq(<option value=""            ></option>\n);
        my $thisCurator  = $curatorData{$joinkey}{$datatype}{curator};
+
      foreach my $comment (keys %premadeComments) {
        my $donposneg    = $curatorData{$joinkey}{$datatype}{donposneg};
+
         if ($thisSelComment eq $comment) { $curatorSelectComment .= qq(<option value="$comment" selected="selected">$premadeComments{$comment}</option>\n); }
        my $selcomment  = $curatorData{$joinkey}{$datatype}{selcomment};
+
           else {                           $curatorSelectComment .= qq(<option value="$comment"                    >$premadeComments{$comment}</option>\n); } }
        my $txtcomment  = $curatorData{$joinkey}{$datatype}{txtcomment};
+
      $curatorSelectComment .= qq(</select>);
        my @line;
+
      push @dataRow, $curatorSelectComment;
        push @line, $joinkey;
 
         push @line, $datatype;
 
        push @line, $thisCurator;
 
        push @line, $donposneg;
 
        push @line, $selcomment;
 
        push @line, $txtcomment;
 
        if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }
 
           else { push @data, \@line; }
 
  } }
 
  &processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);
 
  &printFormClose();
 
} # sub submitNewResults
 
  
</pre>
+
      my $txtcomment = ''; if ( $curData{$datatype}{$joinkey}{txtcomment} ) { $txtcomment = $curData{$datatype}{$joinkey}{txtcomment}; }
 +
      my $shortTxtComment = $txtcomment;  unless ($shortTxtComment) { $shortTxtComment = '&nbsp;'; }
 +
      if ($txtcomment =~ m/^(.{20})/) { $shortTxtComment = $1; $shortTxtComment .= '...'; }
 +
      my $curatorTextareaComment = qq(<div id="div_curator_comment_$trCounter" onclick="document.getElementById('div_curator_comment_$trCounter').style.display = 'none'; document.getElementById('textarea_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').focus();" >$shortTxtComment</div>\n);
 +
      $curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; var divValue = document.getElementById('textarea_curator_comment_$trCounter').value; if (divValue === '') { divValue = '&nbsp;'; } document.getElementById('div_curator_comment_$trCounter').innerHTML = divValue; ">$txtcomment</textarea>\n);
 +
#      $curatorTextareaComment .= qq(<textarea rows="4" cols="80" id="textarea_curator_comment_$trCounter" name="textarea_curator_comment_$trCounter" style="display:none" onblur="document.getElementById('div_curator_comment_$trCounter').style.display = ''; document.getElementById('textarea_curator_comment_$trCounter').style.display = 'none'; document.getElementById('div_curator_comment_$trCounter').innerHTML = document.getElementById('textarea_curator_comment_$trCounter').value.substring(0,20)">$txtcomment</textarea>\n);                # to get the first 20 characters without adding ...
 +
      push @dataRow, $curatorTextareaComment;
  
<br>
+
      $paperPosNegOkay{$joinkey}++;                            # all papers always okay for pos/neg since we no longer have pos/neg filtering  2012 11 08
<br>
 
  
== Printing Curation Statistics Table ==
+
      my $trData = join"</td>$tdDot", @dataRow;
 +
      push @{ $trs{$joinkey} }, qq(${tdDot}$trData</td></tr>\n);
 +
    } # foreach my $datatype (sort keys %{ $allPaperData{$joinkey} })
 +
  } # foreach my $joinkey (sort keys %allPaperData)
 +
</pre>
  
  
The beginning lines (and some later lines) of the ''printCurationStatisticsTable'' subroutine provide code to display the loading times of the Curation Statistics table as a whole, and the loading/processing times of each portion of the code. The lines are ignored when the variable ''$showTimes'' is set to zero. By default, all papers and datatypes are loaded for performing calculations and display. If only some flagging methods or datatypes are selected from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], only papers flagged by the selected methods will load and only numbers for the selected datatypes will be displayed.
+
The following code will print the rows of the table, displaying the requested information for each paper and paper-datatype pair. The code collects all relevant, valid papers and then calculates the number of pages that the results will be distributed across. The number of pages is calculated by dividing the total number of papers resulting from the query and dividing by the number of papers per page requested, and rounding up to an integer value. A drop-down menu is then generated and displayed, providing the curator with a means to access the paper results not available from the current page. A "Get Results" button is displayed which allows the curator to request a new page of results once a page has been selected from the page number drop-down menu. The form displays the total number of papers in the query result. The HTML table is then printed, adding column headers. Columns and their respective headers for cur_curdata (paper ID, curator, validation status, premade comment, and free-text comment) are always displayed, whereas all other columns and headers are optional. Cell borders are displayed with dotted lines.
  
Row header columns are set to 600 pixels in width. Columns for individual datatypes are set to 120 pixels in width. If more than 6 datatypes are selected for viewing (via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]; ALL datatypes are selected by default), the row header column will appear at the right hand side of the table, in addition to the left hand side (default). The overall table width is calculated accordingly. Curatable papers and curated papers are populated into the table. Flagging methods are then loaded. ALL flagging methods (SVM, AFP, CFP) are loaded by default, but specific flagging methods may be selected via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]. Column headers for each datatype (requested from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]] or ALL datatypes if the main Curation Statistics Page) are then printed. The total number of curatable papers is printed once (for the entire table) followed by the number of objects curated (per datatype) and the average number of objects curated per paper (per datatype). Note that for the "rnai" datatype, curation statistics from 8 large scale papers have been manually added to the code as that data does not exist in the RNAi OA. 2084 objects coming from Chronograms have been manually added to the code for otherexpr as that data does not exist in the expression OA but only in Citace Minus. 19052 objects coming from expression images from Itai Yanai's study have been manually added to the code for picture as that data does not exist in the picture OA but only in Citace Minus.
+
To determine which papers to display on a page, the code determines what page number (n) is currently to be viewed and skips the first (n-1)*(papers per page) papers. The following paper and subsequent papers, up to the number requested per page, are then displayed. The code is written so that for each paper, the paper ID, PMID, journal, and PDF link are all only displayed once per paper, regardless of how many datatypes are being viewed. A "Submit New Results" button is then printed at the bottom of the complete table and the form is closed.
 
 
Next are stats for all curated papers, then all validated papers, "Any" flagged papers and "Intersection" flagged papers. The statistics are calculated and then displayed for each of these sections, as determined by any options selected (if applicable).
 
  
  
 
<pre>
 
<pre>
sub printCurationStatisticsTable {
+
   print qq(<input type="hidden" name="trCounter" value="$trCounter">);
   my ($showTimes, $startprintCurationStatisticsTable, $start, $end, $diff) = (0, '', '', '', '');
 
  if ($showTimes) { $startprintCurationStatisticsTable = time; $start = $startprintCurationStatisticsTable; }
 
  
   $chosenPapers{all}++;
+
   my $joinkeysAmount = scalar(keys %paperPosNegOkay);
   my @datatypesToShow;
+
  my $pagesAmount = ceil($joinkeysAmount / $papersPerPage);
 +
  print qq(Page number <select name="select_page">);
 +
  for my $i (1 .. $pagesAmount) {
 +
    if ($i == $pageSelected) { print qq(<option selected="selected">$i</option>\n); }
 +
      else { print qq(<option>$i</option>\n); }
 +
   } # for my $i (1 .. $pagesAmount)
 +
  print qq(</select>);
 +
  print qq(<input type="submit" name="action" value="Get Results">\n);
 +
  print qq(amount of papers $joinkeysAmount<br/>\n);
 +
  print qq(<br />\n);
  
   &populateStatisticsHashToLabel();
+
   print qq(<table border="1">\n);
 
+
  push @headerRow, "datatype";
   ($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");
+
   if ($displaySvm)   { push @headerRow, "SVM Prediction"; }
   unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }
+
   if ($displayStr)   { push @headerRow, "String Match"; }
   foreach my $datatype (sort keys %datatypes) {         # don't tie %datatypes
+
   if ($displayCfp)   { push @headerRow, "cfp value"; }
    ($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");
+
  if ($displayAfp)   { push @headerRow, "afp value"; }
    if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; }     # if all datatypes checkbox was selected, set that datatype's chosen to that datatype
+
  if ($displayOa)     { push @headerRow, "oa value";  }
    if ($chosen) { $chosenDatatypes{$chosen}++; push @datatypesToShow, $datatype; }
+
  push @headerRow, "curator"; push @headerRow, "new result"; push @headerRow, "select comment"; push @headerRow, "textarea comment";
   } # foreach my $datatype (sort %datatypes)
+
  my $headerRow = join"</th>$thDot", @headerRow;
 +
   $headerRow = qq(<tr>$thDot) . $headerRow . qq(</th></tr>);
 +
  print qq($headerRow\n);
  
   my $datatypesToShowAmount = scalar @datatypesToShow;
+
   my $papCount = 0;
   my $rowNameTdWidth = '600'; my $datatypeTdWidth = '120';
+
   my $papCountToSkip = 0; my $papToSkip = ($pageSelected - 1 ) * $papersPerPage;
   my $labelRightFlag = 0; if ($datatypesToShowAmount > 6) { $labelRightFlag++; }
+
   foreach my $joinkey (sort keys %paperPosNegOkay) {                   # from all papers that have good positve-negative values, show all TRs
  my $tableWidth = $rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth;
+
    $papCountToSkip++; next if ($papCountToSkip <= $papToSkip);        # skip entries until at the proper page
  if ($labelRightFlag) { $tableWidth = 2*$rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth; }
+
    $papCount++;
  print qq(<table width="$tableWidth" class="bordered" border="1">\n);
+
    last if ($papCount > $papersPerPage);
 
+
    my $trsInPaperAmount = scalar @{ $trs{$joinkey} };                 # amount of rows for a joinkey, make that the rowspan
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "beforePopulateCuratablePapers  $diff<br>"; }
+
    my $firstTr = shift @{ $trs{$joinkey} };                           # the first table row needs the paper info and rowspan
  &populateCuratablePapers();
+
    my $tdMultiRow = $tdDot; $tdMultiRow =~ s/>$/ rowspan="$trsInPaperAmount">/;       # add the rowspan to the td style
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratablePapers  $diff<br>"; }
+
    my $paperInfoTds = join"</td>$tdMultiRow", @{ $paperInfo{$joinkey} };               # make paper info tds from %paperInfo
   &populateCuratedPapers();
+
    print qq(<tr>${tdMultiRow}$paperInfoTds</td>$firstTr\n);           # print the first row which has paper info
   if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratedPapers  $diff<br>"; }
+
    foreach my $tr (@{ $trs{$joinkey} }) { print qq(<tr>$tr\n); } }    # print other table rows without paper info
 +
   print qq(</table>\n);
 +
 
 +
   print qq(<input type="submit" name="action" value="Submit New Results"><br/>\n);
  
   my @flaggingMethods;                                          # when calculating any or all methods, only want papers for these flagging methods
+
   &printFormClose();
  ($oop, my $displayAll) = &getHtmlVar($query, "checkbox_all_flagging_methods");
+
} # sub getResults
  ($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");
 
  if ( ($displayAll) || ($displayCfp) ) { &populateCfpData();     push @flaggingMethods, 'checkbox_cfp'; }
 
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCfpData $diff<br>"; }
 
  ($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");
 
  if ( ($displayAll) || ($displayAfp) ) { &populateAfpData();      push @flaggingMethods, 'checkbox_afp'; }
 
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateAfpData $diff<br>"; }
 
  ($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");
 
  if ( ($displayAll) || ($displaySvm) ) { &populateSvmData(); push @flaggingMethods, 'checkbox_svm'; }
 
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateSvmData  $diff<br>"; }
 
  my $flaggingMethods = join"=on&", @flaggingMethods; $flaggingMethods .= '=on';
 
  
  &printCurationStatisticsDatatypes(                \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);
+
</pre>
  &printCurationStatisticsPapersCuratable(          \@datatypesToShow, $labelRightFlag);
 
  &printCurationStatisticsObjectsCurated(          \@datatypesToShow, $labelRightFlag);
 
  &printCurationStatisticsObjectsPerPaperCurated(  \@datatypesToShow, $labelRightFlag);
 
  
  $curStats{'dividerallval'}{'allSame'}{'countPap'} = 'blank';
+
<br>
  tie %{ $curStats{'allcur'} }, "Tie::IxHash";                  # make all section appear in this order by tying it
+
<br>
  tie %{ $curStats{'allval'} }, "Tie::IxHash";                  # make all section appear in this order by tying it
 
  $curStats{'dividerany'}{'allSame'}{'countPap'} = 'blank';
 
#  tie %{ $curStats{'any'} }, "Tie::IxHash";                  # not needed, 'any' only has 'pos'
 
  tie %{ $curStats{'any'}{'pos'} }, "Tie::IxHash";              # make any section appear in this order by tying it, though it will get populated last
 
  $curStats{'dividerint'}{'allSame'}{'countPap'} = 'blank';
 
  tie %{ $curStats{'int'}{'pos'} }, "Tie::IxHash";              # make any section appear in this order by tying it, though it will get populated last
 
  
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "printCurationStatistics Datatypes / Objects  $diff<br>"; }
+
== Detailed Results of Papers Page: Processing Input (the ''submitNewResults'' Subroutine) ==
  
  &getCurationStatisticsAllCurated(         \@datatypesToShow );
+
When a curator clicks on the "Submit New Results" button at the bottom of the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], the ''submitNewResults'' subroutine (below) is run. First, the subroutine opens the form and prints the hidden curator, capturing the curator currently logged in. The code then looks at all paper-datatype pair data loaded by the ''getResults'' subroutine which generated the data to display on the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers Page]], skipping over papers that do not have a "new result" (validation status) entry. For papers that have a "new result" entry, if there is a curator already listed in the curator field, that curator is kept as the curator for the given paper-datatype pair. If no curator is present in the field at the time of submission, the curator logged in is entered as the curator for the new data. Alternatively, if a curator has been manually selected from the curator drop-down menu, this selected curator is entered. The data from each row of the form with a "new result" entry is then compared to its respective data in Postgres. If the data is different it is prepared for display in the '''New Results Summary Page''' or the '''Overwrite Confirmation Page''' as described in the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] and the [[New_2012_Curation_Status#Add_Results_Page:_Loading_Page_and_Processing_Input|Add Results Page: Loading Page and Processing Input]] sections. Once complete, the form is closed.
  &getCurationStatisticsAllVal(             \@datatypesToShow );
 
  &getCurationStatisticsAllValPos(          \@datatypesToShow );
 
  &getCurationStatisticsAllValNeg(          \@datatypesToShow );
 
  &getCurationStatisticsAllValConf(        \@datatypesToShow );
 
  
   if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getCurationtStatisticsAll   $diff<br>"; }
+
<pre>
   if ( ($displayAll) || ($displaySvm) ) {
+
sub submitNewResults {
     &getCurationStatisticsSvmNd(           \@datatypesToShow );
+
   &printFormOpen();
     &getCurationStatisticsSvm(             \@datatypesToShow ); }
+
  &printHiddenCurator();
   if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsSvm  $diff<br>"; }
+
  ($oop, my $trAmount) = &getHtmlVar($query, "trCounter");
 
+
  my %papersToAdd;
  if ( ($displayAll) || ($displayAfp) ) {
+
   my %curatorData;
     &getCurationStatisticsAfpEmailed(       \@datatypesToShow );
+
   for my $i (1 .. $trAmount) {
    &getCurationStatisticsAfpFlagged(       \@datatypesToShow );
+
     ($oop, my $curatorDonposneg) = &getHtmlVar($query, "select_curator_donposneg_$i");
     &getCurationStatisticsAfpPos(          \@datatypesToShow );
+
     next unless $curatorDonposneg;                      # skip entries without a curator result for done / positive / negative
    &getCurationStatisticsAfpNeg(           \@datatypesToShow ); }
+
    ($oop, my $dropdownCurator= &getHtmlVar($query, "select_curator_curator_$i");
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAfp  $diff<br>"; }
+
    my $activeCurator = $curator; if ($dropdownCurator) { $activeCurator = $dropdownCurator; } # if a curator was chosen use that, otherwise use logged in curator
 +
    ($oop, my $curatorSelComment) = &getHtmlVar($query, "select_curator_comment_$i");
 +
     ($oop, my $curatorTxtComment) = &getHtmlVar($query, "textarea_curator_comment_$i");
 +
     ($oop, my $joinkey)           = &getHtmlVar($query, "joinkey_$i");
 +
    ($oop, my $datatype)         = &getHtmlVar($query, "datatype_$i");
  
  if ( ($displayAll) || ($displayCfp) ) {
+
    $papersToAdd{$datatype}{$joinkey}++;
     &getCurationStatisticsCfpFlagged(      \@datatypesToShow );
+
    $curatorData{$joinkey}{$datatype}{curator}    = $activeCurator;
     &getCurationStatisticsCfpPos(          \@datatypesToShow );
+
     $curatorData{$joinkey}{$datatype}{donposneg}  = $curatorDonposneg;
     &getCurationStatisticsCfpNeg(           \@datatypesToShow ); }
+
     $curatorData{$joinkey}{$datatype}{selcomment} = $curatorSelComment;
   if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsCfp  $diff<br>"; }
+
     $curatorData{$joinkey}{$datatype}{txtcomment} = $curatorTxtComment;
 
+
  } # for my $i (1 .. $trAmount)
  &getCurationStatisticsAny(               \@datatypesToShow, \@flaggingMethods );
+
  my %pgData;
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAny  $diff<br>"; }
+
   foreach my $datatype (sort keys %papersToAdd) {
 +
    my $joinkeys = join"','", sort keys %{ $papersToAdd{$datatype} };
 +
    my ($pgDatatypeDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);
 +
    my %pgDatatypeData = %$pgDatatypeDataRef;
 +
    foreach my $joinkey (keys %pgDatatypeData) {
 +
      foreach my $datatype (keys %{ $pgDatatypeData{$joinkey} }) {
 +
          foreach my $valuetype (keys %{ $pgDatatypeData{$joinkey}{$datatype} }) {
 +
            $pgData{$joinkey}{$datatype}{$valuetype} = $pgDatatypeData{$joinkey}{$datatype}{$valuetype}; } } } }
  
   my @labelKeys;                                       # labelKeys will be created here
+
   my @data; my @duplicateData;
   my $depth = 0;                                        # recursion depth into hash
+
   foreach my $joinkey (sort keys %curatorData) {
  &recurseCurStats(\%curStats, \@labelKeys, $depth, \@datatypesToShow, $flaggingMethods, $labelRightFlag );
+
    foreach my $datatype (keys %{ $curatorData{$joinkey} }) {
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "recurseCurStats  $diff<br>"; }
+
        my $thisCurator  = $curatorData{$joinkey}{$datatype}{curator};
 
+
        my $donposneg    = $curatorData{$joinkey}{$datatype}{donposneg};
  &printCurationStatisticsDatatypes(                \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);
+
        my $selcomment  = $curatorData{$joinkey}{$datatype}{selcomment};
  print "</table>\n";
+
        my $txtcomment  = $curatorData{$joinkey}{$datatype}{txtcomment};
  if ($showTimes) { $end = time; $diff = $end - $startprintCurationStatisticsTable; print "printCurationStatisticsTable $diff<br>"; }
+
        my @line;
} # sub printCurationStatisticsTable
+
        push @line, $joinkey;
 
+
        push @line, $datatype;
</pre>
+
        push @line, $thisCurator;
 +
        push @line, $donposneg;
 +
        push @line, $selcomment;
 +
        push @line, $txtcomment;
 +
        if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }
 +
          else { push @data, \@line; }
 +
  } }
 +
  &processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);
 +
  &printFormClose();
 +
} # sub submitNewResults
 +
 
 +
</pre>
  
 
<br>
 
<br>
 
<br>
 
<br>
  
== Premade Comments ==
+
== Printing Curation Statistics Table ==
  
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, curators have the option to select a comment from a drop down list of comments to apply to this paper in the context of the relevant datatype.
 
  
In the code, the comments are stored in a hash table called ''%premadeComments''. The keys (stored in postgres) of these comments are only numbers, so the descriptions/titles can change or be updated and still apply retroactively.
+
The beginning lines (and some later lines) of the ''printCurationStatisticsTable'' subroutine provide code to display the loading times of the Curation Statistics table as a whole, and the loading/processing times of each portion of the code. The lines are ignored when the variable ''$showTimes'' is set to zero. By default, all papers and datatypes are loaded for performing calculations and display. If only some flagging methods or datatypes are selected from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], only papers flagged by the selected methods will load and only numbers for the selected datatypes will be displayed.
  
Code:
+
Row header columns are set to 600 pixels in width. Columns for individual datatypes are set to 120 pixels in width. If more than 6 datatypes are selected for viewing (via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]; ALL datatypes are selected by default), the row header column will appear at the right hand side of the table, in addition to the left hand side (default). The overall table width is calculated accordingly. Curatable papers and curated papers are populated into the table. Flagging methods are then loaded. ALL flagging methods (SVM, STR, AFP, CFP) are loaded by default, but specific flagging methods may be selected via the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]]. Column headers for each datatype (requested from the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]] or ALL datatypes if the main Curation Statistics Page) are then printed. The total number of curatable papers is printed once (for the entire table) followed by the number of objects curated (per datatype) and the average number of objects curated per paper (per datatype). Note that for the "rnai" datatype, curation statistics from 8 large scale papers have been manually added to the code as that data does not exist in the RNAi OA. 2084 objects coming from Chronograms have been manually added to the code for otherexpr as that data does not exist in the expression OA but only in Citace Minus. 19052 objects coming from expression images from Itai Yanai's study have been manually added to the code for picture as that data does not exist in the picture OA but only in Citace Minus.
  
<pre>
+
Next are stats for all curated papers, then all validated papers, "Any" flagged papers and "Intersection" flagged papers. The statistics are calculated and then displayed for each of these sections, as determined by any options selected (if applicable).
sub populatePremadeComments {
 
  $premadeComments{"1"} = "SVM Positive, Curation Negative";
 
  $premadeComments{"2"} = "C. elegans as heterologous expression system";
 
  $premadeComments{"3"} = "pre-made comment #3";}
 
</pre>
 
  
So, as of now:
 
  
 
<pre>
 
<pre>
 +
sub printCurationStatisticsTable {
 +
  my ($showTimes, $startprintCurationStatisticsTable, $start, $end, $diff) = (0, '', '', '', '');
 +
  if ($showTimes) { $startprintCurationStatisticsTable = time; $start = $startprintCurationStatisticsTable; }
  
| Key |            Comment                                    |
+
  $chosenPapers{all}++;
|  1  | "SVM Positive, Curation Negative"                    |
+
  my @datatypesToShow;
|  2  | "C. elegans as heterologous expression system"        |
 
|  3  | "pre-made comment #3"                                |
 
  
</pre>
+
  &populateStatisticsHashToLabel();
  
 +
  ($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");
 +
  unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }
 +
  foreach my $datatype (sort keys %datatypes) {        # don't tie %datatypes
 +
    ($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");
 +
    if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; }      # if all datatypes checkbox was selected, set that datatype's chosen to that datatype
 +
    if ($chosen) { $chosenDatatypes{$chosen}++; push @datatypesToShow, $datatype; }
 +
  } # foreach my $datatype (sort %datatypes)
  
Hence, if a completely new comment is desired, a new key will need to be made and there after associated with that new comment. Also, old keys should never be recycled and documentation describing what each key refers to should be maintained in this Wiki.
+
  my $datatypesToShowAmount = scalar @datatypesToShow;
 +
  my $rowNameTdWidth = '600'; my $datatypeTdWidth = '120';
 +
  my $labelRightFlag = 0; if ($datatypesToShowAmount > 6) { $labelRightFlag++; }
 +
  my $tableWidth = $rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth;
 +
  if ($labelRightFlag) { $tableWidth = 2*$rowNameTdWidth + $datatypesToShowAmount * $datatypeTdWidth; }
 +
  print qq(<table width="$tableWidth" class="bordered" border="1">\n);
  
<br>
+
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "beforePopulateCuratablePapers  $diff<br>"; }
<br>
+
  &populateCuratablePapers();
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratablePapers  $diff<br>"; }
 +
  &populateCuratedPapers();
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCuratedPapers  $diff<br>"; }
  
== New Result ==
+
  my @flaggingMethods;                                          # when calculating any or all methods, only want papers for these flagging methods
 +
  ($oop, my $displayAll) = &getHtmlVar($query, "checkbox_all_flagging_methods");
 +
  ($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");
 +
  if ( ($displayAll) || ($displayCfp) ) { &populateCfpData();      push @flaggingMethods, 'checkbox_cfp'; }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateCfpData $diff<br>"; }
 +
  ($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");
 +
  if ( ($displayAll) || ($displayAfp) ) { &populateAfpData();      push @flaggingMethods, 'checkbox_afp'; }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateAfpData $diff<br>"; }
 +
  ($oop, my $displayStr) = &getHtmlVar($query, "checkbox_str");
 +
  if ($displayStr) { &populateStrData(); push @flaggingMethods, 'checkbox_str'; }
 +
  ($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");
 +
  if ( ($displayAll) || ($displaySvm) ) { &populateSvmData(); push @flaggingMethods, 'checkbox_svm'; }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "populateSvmData  $diff<br>"; }
 +
  my $flaggingMethods = join"=on&", @flaggingMethods; $flaggingMethods .= '=on';
 +
 
 +
  &printCurationStatisticsDatatypes(                \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);
 +
  &printCurationStatisticsPapersCuratable(          \@datatypesToShow, $labelRightFlag);
 +
  &printCurationStatisticsObjectsCurated(          \@datatypesToShow, $labelRightFlag);
 +
  &printCurationStatisticsObjectsPerPaperCurated(  \@datatypesToShow, $labelRightFlag);
  
Each paper-datatype pair can be assigned a "New Result" indicating its status as curated (or not) or validated (or not), and if validated, positive or negative for the particular paper-datatype pair. These results can be entered via the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] or directly in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page via the "New Results" column. The code is below:
+
  $curStats{'dividerallval'}{'allSame'}{'countPap'} = 'blank';
 +
  tie %{ $curStats{'allcur'} }, "Tie::IxHash";                  # make all section appear in this order by tying it
 +
  tie %{ $curStats{'allval'} }, "Tie::IxHash";                  # make all section appear in this order by tying it
 +
  $curStats{'dividerany'}{'allSame'}{'countPap'} = 'blank';
 +
#  tie %{ $curStats{'any'} }, "Tie::IxHash";                  # not needed, 'any' only has 'pos'
 +
  tie %{ $curStats{'any'}{'pos'} }, "Tie::IxHash";              # make any section appear in this order by tying it, though it will get populated last
 +
  $curStats{'dividerint'}{'allSame'}{'countPap'} = 'blank';
 +
  tie %{ $curStats{'int'}{'pos'} }, "Tie::IxHash";              # make any section appear in this order by tying it, though it will get populated last
  
Code:
+
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "printCurationStatistics Datatypes / Objects  $diff<br>"; }
  
<pre>
+
   &getCurationStatisticsAllCurated(        \@datatypesToShow );
sub populateDonPosNegOptions {
+
   &getCurationStatisticsAllVal(            \@datatypesToShow );
   $donPosNegOptions{""}            = "";
+
   &getCurationStatisticsAllValPos(          \@datatypesToShow );
   $donPosNegOptions{"curated"}      = "curated and positive";
+
   &getCurationStatisticsAllValNeg(          \@datatypesToShow );
   $donPosNegOptions{"positive"}    = "validated positive";
+
   &getCurationStatisticsAllValConf(        \@datatypesToShow );
   $donPosNegOptions{"negative"}    = "validated negative";
 
   $donPosNegOptions{"notvalidated"} = "not validated";}
 
</pre>
 
  
where "curated", "positive", "negative", and "notvalidated" are the keys (for the %donPosNegOptions hash table in the form code) that will be stored in postgres and the corresponding values (e.g. "curated and positive") are what will be displayed on the form.
+
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getCurationtStatisticsAll  $diff<br>"; }
 +
  if ( ($displayAll) || ($displaySvm) ) {
 +
    &getCurationStatisticsSvmNd(            \@datatypesToShow );
 +
    &getCurationStatisticsSvm(             \@datatypesToShow ); }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsSvm  $diff<br>"; }
  
Note that "" and "not validated" represent no data for that paper-datatype pair, but "not validated" is present as an option to overwrite accidental validations (it is impossible to go back to a blank "" field via the form).
+
  if ( ($displayAll) || ($displayStr) ) {
 +
    &getCurationStatisticsStrFlagged(      \@datatypesToShow );
 +
    &getCurationStatisticsStrPos(          \@datatypesToShow );
 +
    &getCurationStatisticsStrNeg(          \@datatypesToShow ); }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsStr  $diff<br>"; }
  
<br>
+
  if ( ($displayAll) || ($displayAfp) ) {
<br>
+
    &getCurationStatisticsAfpEmailed(      \@datatypesToShow );
 +
    &getCurationStatisticsAfpFlagged(      \@datatypesToShow );
 +
    &getCurationStatisticsAfpPos(          \@datatypesToShow );
 +
    &getCurationStatisticsAfpNeg(          \@datatypesToShow ); }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAfp  $diff<br>"; }
  
== Datatypes ==
+
  if ( ($displayAll) || ($displayCfp) ) {
 +
    &getCurationStatisticsCfpFlagged(      \@datatypesToShow );
 +
    &getCurationStatisticsCfpPos(          \@datatypesToShow );
 +
    &getCurationStatisticsCfpNeg(          \@datatypesToShow ); }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsCfp  $diff<br>"; }
  
The form determines which datatypes exist via a 'populateDatatypes' subroutine in the form code. As of 12-5-2012, the form first collects all datatypes used in SVM from the 'cur_svmdata' postgres table (which, as of 12-5-2012, all also are identically named in the Author First Pass (AFP) and Curator First Pass (CFP) tables) and then supplements with other datatypes not in SVM but in AFP and CFP (as of 12-5-2012, all anatomy curation related datatypes) plus one additional datatype ("geneticablation") not in SVM, AFP, or CFP.
+
  &getCurationStatisticsAny(               \@datatypesToShow, \@flaggingMethods );
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "getStatsAny  $diff<br>"; }
 +
 
 +
  my @labelKeys;                                        # labelKeys will be created here
 +
  my $depth = 0;                                        # recursion depth into hash
 +
  &recurseCurStats(\%curStats, \@labelKeys, $depth, \@datatypesToShow, $flaggingMethods, $labelRightFlag );
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "recurseCurStats  $diff<br>"; }
  
Here is the code:
+
  &printCurationStatisticsDatatypes(                \@datatypesToShow, $rowNameTdWidth, $datatypeTdWidth, $labelRightFlag);
 +
  print "</table>\n";
 +
  if ($showTimes) { $end = time; $diff = $end - $startprintCurationStatisticsTable; print "printCurationStatisticsTable $diff<br>"; }
 +
} # sub printCurationStatisticsTable
  
<pre>
 
sub populateDatatypes {
 
  $result = $dbh->prepare( "SELECT DISTINCT(cur_datatype) FROM cur_svmdata " );
 
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
  while (my @row = $result->fetchrow) { $datatypesAfpCfp{$row[0]} = $row[0]; }
 
  $datatypesAfpCfp{'chemicals'} = 'chemicals';      # added for Karen 2013 10 02
 
  $datatypesAfpCfp{'blastomere'}    = 'cellfunc';
 
  $datatypesAfpCfp{'exprmosaic'}    = 'siteaction';
 
  $datatypesAfpCfp{'geneticmosaic'} = 'mosaic';
 
  $datatypesAfpCfp{'laserablation'} = 'ablationdata';
 
  foreach my $datatype (keys %datatypesAfpCfp) { $datatypes{$datatype}++; }
 
  $datatypes{'geneticablation'}++;
 
} # sub populateDatatypes
 
 
</pre>
 
</pre>
  
 +
<br>
 +
<br>
  
As for the datatypes currently (12-5-2012) NOT in SVM but IN AFP and CFP, the datatype name is different between the Curation Status form and the AFP and CFP forms. So, the datatypes named "cellfunc", "siteaction", "mosaic", and "ablationdata" in the AFP and CFP tables are respectively named "blastomere", "exprmosaic", "geneticmosaic", "laserablation" in the Curation Status form.
+
== Premade Comments ==
  
The IMPORTANT thing here is: if, at some point, the datatypes are changed (added, renamed, etc.), and the code is not updated in kind, the form will likely break.  Curators should tell Juancarlos/Chris/Daniela to update the code.
+
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, curators have the option to select a comment from a drop down list of comments to apply to this paper in the context of the relevant datatype.
 +
 
 +
In the code, the comments are stored in a hash table called ''%premadeComments''. The keys (stored in postgres) of these comments are only numbers, so the descriptions/titles can change or be updated and still apply retroactively.
 +
 
 +
Code:
  
new datatypes should be accounted in this code :<br>
+
<pre>
* - no svm, no afp/cfp : add to %datatypes hash like 'geneticablation'.
+
sub populatePremadeComments {
* - no svm, yes afp/cfp : add to %datatypesAfpCfp + %datatypes hashes like 'blastomere'
+
  $premadeComments{"1"} = "SVM Positive, Curation Negative";
* - yes svm, yes afp/cfp : add to code to populate cur_svmdata, which will populate in the SELECT query
+
  $premadeComments{"2"} = "C. elegans as heterologous expression system";
* - yes svm, no afp/cfp : add to code to populate cur_svmdata, which will populate in the SELECT query, but also subsequently delete from %datatypesAfpCfp (to prevent a postgres query to a non-existing table which will crash the form)
+
  $premadeComments{"3"} = "pre-made comment #3";}
 +
</pre>
  
<br>
+
So, as of now:
<br>
 
  
== Creating PDF links to papers ==
+
<pre>
  
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, each paper ID is linked to its corresponding PDF document using the code below:
+
| Key |            Comment                                    |
 +
|  1  | "SVM Positive, Curation Negative"                    |
 +
|  2  | "C. elegans as heterologous expression system"        |
 +
|  3  | "pre-made comment #3"                                |
  
Code:
+
</pre>
  
<pre>
 
sub populatePdf {
 
  $result = $dbh->prepare( "SELECT * FROM pap_electronic_path WHERE pap_electronic_path IS NOT NULL");
 
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
  my %temp;
 
  while (my @row = $result->fetchrow) {
 
    my ($data, $isPdf) = &makePdfLinkFromPath($row[1]);
 
    $temp{$row[0]}{$isPdf}{$data}++; }
 
  foreach my $joinkey (sort keys %temp) {
 
    my @pdfs;
 
    foreach my $isPdf (reverse sort keys %{ $temp{$joinkey} }) {
 
      foreach my $pdfLink (sort keys %{ $temp{$joinkey}{$isPdf} }) {
 
        push @pdfs, $pdfLink; } }
 
    my ($pdfs) = join"<br/>", @pdfs;
 
    $pdf{$joinkey} = $pdfs;
 
  } # foreach my $joinkey (sort keys %temp)
 
} # sub populatePdf
 
  
sub makePdfLinkFromPath {
+
Hence, if a completely new comment is desired, a new key will need to be made and there after associated with that new comment. Also, old keys should never be recycled and documentation describing what each key refers to should be maintained in this Wiki.
  my ($path) = shift;
 
  my ($pdf) = $path =~ m/\/([^\/]*)$/;
 
  my $isPdf = 0; if ($pdf =~ m/\.pdf$/) { $isPdf++; }          # kimberly wants .pdf files on top, so need to flag to sort
 
  my $link = 'http://tazendra.caltech.edu/~acedb/daniel/' . $pdf;
 
  my $data = "<a href=\"$link\" target=\"new\">$pdf</a>"; return ($data, $isPdf); }
 
</pre>
 
  
 +
<br>
 +
<br>
  
Note the table name ("pap_electronic_path"), the URL path ("http://tazendra.caltech.edu/~acedb/daniel/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.
+
== New Result ==
  
 
+
Each paper-datatype pair can be assigned a "New Result" indicating its status as curated (or not) or validated (or not), and if validated, positive or negative for the particular paper-datatype pair. These results can be entered via the [[New_2012_Curation_Status#Add_Results_Page|Add Results Page]] or directly in the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page via the "New Results" column. The code is below:
== Creating hyperlinks to PubMed paper pages ==
 
 
 
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page each PubMed ID is linked to its corresponding PubMed webpage using the code below:
 
  
 
Code:
 
Code:
  
 
<pre>
 
<pre>
sub populatePmid {
+
sub populateDonPosNegOptions {
   $result = $dbh->prepare( "SELECT * FROM pap_identifier WHERE pap_identifier ~ 'pmid'" );
+
   $donPosNegOptions{""}            = "";
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
   $donPosNegOptions{"curated"}      = "curated and positive";
   my %temp;
+
   $donPosNegOptions{"positive"}     = "validated positive";
  while (my @row = $result->fetchrow) { if ($row[0]) {
+
  $donPosNegOptions{"negative"}    = "validated negative";
     my ($data) = &makeNcbiLinkFromPmid($row[1]);
+
  $donPosNegOptions{"notvalidated"} = "not validated";}
    $temp{$row[0]}{$data}++; } }
 
  foreach my $joinkey (sort keys %temp) {
 
     my ($pmids) = join"<br/>", keys %{ $temp{$joinkey} };
 
    $pmid{$joinkey} = $pmids;
 
  } # foreach my $joinkey (sort keys %temp)
 
} # sub populatePmid
 
 
</pre>
 
</pre>
  
<pre>
+
where "curated", "positive", "negative", and "notvalidated" are the keys (for the %donPosNegOptions hash table in the form code) that will be stored in postgres and the corresponding values (e.g. "curated and positive") are what will be displayed on the form.
sub makeNcbiLinkFromPmid {
 
  my $pmid = shift;
 
  my ($id) = $pmid =~ m/(\d+)/;
 
  my $link = 'http://www.ncbi.nlm.nih.gov/pubmed/' . $id;
 
  my $data = "<a href=\"$link\" target=\"new\">$pmid</a>"; return $data; }
 
</pre>
 
  
Note the table name ("pap_identifier"), the table specifier ("WHERE pap_identifier ~ 'pmid'"), the URL path ("http://www.ncbi.nlm.nih.gov/pubmed/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.
+
Note that "" and "not validated" represent no data for that paper-datatype pair, but "not validated" is present as an option to overwrite accidental validations (it is impossible to go back to a blank "" field via the form).
  
 +
<br>
 +
<br>
  
== Populating the Journal Names ==
+
== Datatypes ==
 +
 
 +
The form determines which datatypes exist via a 'populateDatatypes' subroutine in the form code. As of 12-5-2012, the form first collects all datatypes used in SVM from the 'cur_svmdata' postgres table (which, as of 12-5-2012, all also are identically named in the Author First Pass (AFP) and Curator First Pass (CFP) tables) and then supplements with other datatypes not in SVM but in AFP and CFP (as of 12-5-2012, all anatomy curation related datatypes) plus one additional datatype ("geneticablation") not in SVM, AFP, or CFP.
  
Journal names for each paper are populated via the following code:
+
Here is the code:
  
 
<pre>
 
<pre>
sub populateJournal {
+
sub populateDatatypes {
   $result = $dbh->prepare( "SELECT * FROM pap_journal WHERE pap_journal IS NOT NULL" );
+
   $result = $dbh->prepare( "SELECT DISTINCT(cur_datatype) FROM cur_svmdata " );
 +
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
  while (my @row = $result->fetchrow) { $datatypesAfpCfp{$row[0]} = $row[0]; }
 +
  $datatypesAfpCfp{'chemicals'} = 'chemicals';      # added for Karen 2013 10 02
 +
  $datatypesAfpCfp{'blastomere'}    = 'cellfunc';
 +
  $datatypesAfpCfp{'exprmosaic'}    = 'siteaction';
 +
  $datatypesAfpCfp{'geneticmosaic'} = 'mosaic';
 +
  $datatypesAfpCfp{'laserablation'} = 'ablationdata';
 +
  foreach my $datatype (keys %datatypesAfpCfp) { $datatypes{$datatype}++; }
 +
  $datatypes{'geneticablation'}++;
 +
  $datatypes{'picture'}++;          # for Daniela's pictures
 +
  $result = $dbh->prepare( "SELECT DISTINCT(cur_datatype) FROM cur_strdata" ); # from string search data
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   while (my @row = $result->fetchrow) { if ($row[0]) { $journal{$row[0]} = $row[1]; } }
+
   while (my @row = $result->fetchrow) { $datatypes{$row[0]} = $row[0]; }
} # sub populateJournal
+
} # sub populateDatatypes
 
</pre>
 
</pre>
  
  
Note the table "pap_journal".
+
As for the datatypes currently (12-5-2012) NOT in SVM but IN AFP and CFP, the datatype name is different between the Curation Status form and the AFP and CFP forms. So, the datatypes named "cellfunc", "siteaction", "mosaic", and "ablationdata" in the AFP and CFP tables are respectively named "blastomere", "exprmosaic", "geneticmosaic", "laserablation" in the Curation Status form.
  
 +
The IMPORTANT thing here is: if, at some point, the datatypes are changed (added, renamed, etc.), and the code is not updated in kind, the form will likely break.  Curators should tell Juancarlos/Chris/Daniela to update the code.
  
== Loading Data into the Form ==
+
new datatypes should be accounted in this code :<br>
 +
* - no svm, no str/afp/cfp : add to %datatypes hash like 'geneticablation'.
 +
* - no svm, no str, yes afp/cfp : add to %datatypesAfpCfp + %datatypes hashes like 'blastomere'
 +
* - yes svm, yes afp/cfp, no str : add to code to populate cur_svmdata, which will populate in the SELECT query
 +
* - yes svm, no str/afp/cfp : add to code to populate cur_svmdata, which will populate in the SELECT query, but also subsequently delete from %datatypesAfpCfp (to prevent a postgres query to a non-existing table which will crash the form)
  
On the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], or the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], curators have the option to specify what flagging methods (SVM, AFP, and/or CFP), curation sources (Ontology Annotator or cur_curdata [which is the data generated from this form]), and/or datatypes (e.g. geneint, rnai) they would like to view.
+
Read str results from cur_strdata into %datatypes hash.
  
There are separate hashes for storing the different types of data, all of which have a key of datatype, subkey paperID, sub-subkeys of other things depending on the hash (see individual subsections below).
+
On February 3rd 2015 we have added catalyticactivity (catalyticact) datatype as it was added in the SVM pipeline. It has SVM results, no afp / cfp / string / oa_data / form_data, and any positives only come through data entered through the curation status form (DR)
 +
<br>
 +
<br>
  
There is an option to select specific datatype, in which case only the data for those datatypes is loaded.  Similarly if only some paperIDs have been selected, only those paperIDs are loaded.
+
== Creating PDF links to papers ==
  
 +
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page, each paper ID is linked to its corresponding PDF document using the code below:
  
=== Loading curatable papers ===
+
Code:
 
 
Only papers that have a 'valid' pap_status value and a 'primary' pap_primary_data value are considered curatable.  These are stored in the %curatablePapers hash. ( paperID => status )
 
  
 
<pre>
 
<pre>
sub populateCuratablePapers {
+
sub populatePdf {
  my $query = "SELECT * FROM pap_status WHERE pap_status = 'valid' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary')";
+
   $result = $dbh->prepare( "SELECT * FROM pap_electronic_path WHERE pap_electronic_path IS NOT NULL");
   $result = $dbh->prepare( $query );
 
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   while (my @row = $result->fetchrow) { $curatablePapers{$row[0]} = $row[1]; }
+
  my %temp;
} # sub populateCuratablePapers
+
   while (my @row = $result->fetchrow) {
</pre>
+
    my ($data, $isPdf) = &makePdfLinkFromPath($row[1]);
 +
    $temp{$row[0]}{$isPdf}{$data}++; }
 +
  foreach my $joinkey (sort keys %temp) {
 +
    my @pdfs;
 +
    foreach my $isPdf (reverse sort keys %{ $temp{$joinkey} }) {
 +
      foreach my $pdfLink (sort keys %{ $temp{$joinkey}{$isPdf} }) {
 +
        push @pdfs, $pdfLink; } }
 +
    my ($pdfs) = join"<br/>", @pdfs;
 +
    $pdf{$joinkey} = $pdfs;
 +
  } # foreach my $joinkey (sort keys %temp)
 +
} # sub populatePdf
 +
 
 +
sub makePdfLinkFromPath {
 +
  my ($path) = shift;
 +
  my ($pdf) = $path =~ m/\/([^\/]*)$/;
 +
  my $isPdf = 0; if ($pdf =~ m/\.pdf$/) { $isPdf++; }           # kimberly wants .pdf files on top, so need to flag to sort
 +
  my $link = 'http://tazendra.caltech.edu/~acedb/daniel/' . $pdf;
 +
  my $data = "<a href=\"$link\" target=\"new\">$pdf</a>"; return ($data, $isPdf); }
 +
</pre>
  
  
=== Loading afp_ data ===
+
Note the table name ("pap_electronic_path"), the URL path ("http://tazendra.caltech.edu/~acedb/daniel/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.
 
 
Populate %afpEmailed, %afpData, %afpFlagged, %afpPos, %afpNeg.
 
  
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding afp_ postgres table, and if it's a curatable paper store the value in the %afpData hash (datatype, paper ID => AFP result).
 
  
Query afp_email and if it's a curatable paper store in %afpEmailed hash ( paperID => 1 ) for afp emailed statistics.
+
== Creating hyperlinks to PubMed paper pages ==
  
Query afp_lasttouched to see if a paper has been flagged for afp.  Skip if it's not a curatable paper.  For all %chosenDatatypes store in %afpFlagged ( datatype, paperID => 1 )
+
In the [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page each PubMed ID is linked to its corresponding PubMed webpage using the code below:
  
For each of the %afpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %afpData value, store in %afpPos hash ( positive flag for afp ), otherwise store in %afpNeg hash (negative flag for afp )  ( datatype, paperID => 1 )
+
Code:
  
 
<pre>
 
<pre>
sub populateAfpData {
+
sub populatePmid {
  foreach my $datatype (sort keys %chosenDatatypes) {
+
   $result = $dbh->prepare( "SELECT * FROM pap_identifier WHERE pap_identifier ~ 'pmid'" );
    next unless $datatypesAfpCfp{$datatype};
 
    my $pgtable_datatype = $datatypesAfpCfp{$datatype};
 
    $result = $dbh->prepare( "SELECT * FROM afp_$pgtable_datatype" );
 
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
    while (my @row = $result->fetchrow) {
 
      next unless ($curatablePapers{$row[0]});
 
      $afpData{$datatype}{$row[0]} = $row[1]; }
 
  } # foreach my $datatype (sort keys %chosenDatatypes)
 
 
 
   $result = $dbh->prepare( "SELECT * FROM afp_email" );
 
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   while (my @row = $result->fetchrow) {
+
  my %temp;
    next unless ($curatablePapers{$row[0]});
+
   while (my @row = $result->fetchrow) { if ($row[0]) {
     $afpEmailed{$row[0]}++; }
+
     my ($data) = &makeNcbiLinkFromPmid($row[1]);
  $result = $dbh->prepare( "SELECT * FROM afp_lasttouched" );
+
     $temp{$row[0]}{$data}++; } }
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
   foreach my $joinkey (sort keys %temp) {
  while (my @row = $result->fetchrow) {
+
     my ($pmids) = join"<br/>", keys %{ $temp{$joinkey} };
    next unless ($curatablePapers{$row[0]});
+
    $pmid{$joinkey} = $pmids;
     foreach my $datatype (sort keys %chosenDatatypes) {
+
  } # foreach my $joinkey (sort keys %temp)
      $afpFlagged{$datatype}{$row[0]}++; } }
+
} # sub populatePmid
   foreach my $datatype (sort keys %chosenDatatypes) {
 
     foreach my $joinkey (sort keys %{ $afpFlagged{$datatype} }) {
 
      if ($afpData{$datatype}{$joinkey}) { $afpPos{$datatype}{$joinkey}++; }
 
        else { $afpNeg{$datatype}{$joinkey}++; } } }
 
} # sub populateAfpData
 
 
</pre>
 
</pre>
  
 +
<pre>
 +
sub makeNcbiLinkFromPmid {
 +
  my $pmid = shift;
 +
  my ($id) = $pmid =~ m/(\d+)/;
 +
  my $link = 'http://www.ncbi.nlm.nih.gov/pubmed/' . $id;
 +
  my $data = "<a href=\"$link\" target=\"new\">$pmid</a>"; return $data; }
 +
</pre>
  
=== Loading cfp_ data ===
+
Note the table name ("pap_identifier"), the table specifier ("WHERE pap_identifier ~ 'pmid'"), the URL path ("http://www.ncbi.nlm.nih.gov/pubmed/"), and (because of the code 'target=\"new\"') that the link will open a new window or tab. Also note that opening another link on the original page (e.g. [[New_2012_Curation_Status#Detailed_Results_of_Papers_Page|Detailed Results of Papers]] page) will open that link in that same new window/tab, clearing out what you had opened previously.
  
Populate %cfpData, %cfpFlagged, %cfpPos, %cfpNeg.
 
  
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding cfp_ postgres table, and if it's a curatable paper store the value in the %cfpData hash (datatype, paper ID => CFP result).
+
== Populating the Journal Names ==
  
Query cfp_curator to see if a paper has been flagged for cfp.  Skip if it's not a curatable paper.  For all %chosenDatatypes store in %cfpFlagged ( datatype, paperID => 1 )
+
Journal names for each paper are populated via the following code:
  
For each of the %cfpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %cfpData value, store in %cfpPos hash ( positive flag for cfp ), otherwise store in %cfpNeg hash (negative flag for cfp ) ( datatype, paperID => 1 )
+
<pre>
 +
sub populateJournal {
 +
  $result = $dbh->prepare( "SELECT * FROM pap_journal WHERE pap_journal IS NOT NULL" );
 +
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
  while (my @row = $result->fetchrow) { if ($row[0]) { $journal{$row[0]} = $row[1]; } }
 +
} # sub populateJournal
 +
</pre>
  
<pre>
 
sub populateCfpData {
 
  foreach my $datatype (sort keys %chosenDatatypes) {
 
    next unless $datatypesAfpCfp{$datatype};
 
    my $pgtable_datatype = $datatypesAfpCfp{$datatype};
 
    $result = $dbh->prepare( "SELECT * FROM cfp_$pgtable_datatype" );
 
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
    while (my @row = $result->fetchrow) {
 
      next unless ($curatablePapers{$row[0]});
 
      $cfpData{$datatype}{$row[0]} = $row[1]; }
 
  } # foreach my $datatype (sort keys %chosenDatatypes)
 
  
  $result = $dbh->prepare( "SELECT * FROM cfp_curator" );
+
Note the table "pap_journal".
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
  while (my @row = $result->fetchrow) {
 
    next unless ($curatablePapers{$row[0]});
 
    foreach my $datatype (sort keys %chosenDatatypes) {
 
      $cfpFlagged{$datatype}{$row[0]}++; } }
 
  foreach my $datatype (sort keys %chosenDatatypes) {
 
    foreach my $joinkey (sort keys %{ $cfpFlagged{$datatype} }) {
 
      if ($cfpData{$datatype}{$joinkey}) { $cfpPos{$datatype}{$joinkey}++; }
 
        else { $cfpNeg{$datatype}{$joinkey}++; } } }
 
} # sub populateCfpData
 
</pre>
 
  
  
=== Loading svm data ===
+
== Loading Data into the Form ==
  
Populate %svmData hash.
+
On the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the [[New_2012_Curation_Status#Specific_Paper_Page|Specific Paper Page]], or the [[New_2012_Curation_Status#Prepopulated_Specific_Papers_Page|Prepopulated Specific Papers Page]], curators have the option to specify what flagging methods (SVM, STR, AFP, and/or CFP), curation sources (Ontology Annotator or cur_curdata [which is the data generated from this form]), and/or datatypes (e.g. geneint, rnai) they would like to view.
  
For each of the chosen datatypes, query the cur_svmdata table where cur_datatype is that datatype, and sort by cur_date so that we always have the latest value for a given paper-datatype pair.  The svm result is the 4th column, the paper ID is the first column.  skip papers that are not %curatablePapers.  store in %svmData ( datatype, paper => svm_result ).  cur_svmdata could have multiple results for a given paper-datatype pair, we'll consider only the most recent result (by the directory name/date on Yuling's machine).
+
There are separate hashes for storing the different types of data, all of which have a key of datatype, subkey paperID, sub-subkeys of other things depending on the hash (see individual subsections below).
  
<pre>
+
There is an option to select specific datatype, in which case only the data for those datatypes is loadedSimilarly if only some paperIDs have been selected, only those paperIDs are loaded.
sub populateSvmData {
 
#    $result = $dbh->prepare( "SELECT * FROM cur_svmdata ORDER BY cur_datatype, cur_date" );  # always doing for all datatypes vs looping for chosen takes 4.66vs 2.74 secs
 
  foreach my $datatype (sort keys %chosenDatatypes) {
 
    $result = $dbh->prepare( "SELECT * FROM cur_svmdata WHERE cur_datatype = '$datatype' ORDER BY cur_date" );
 
      # table stores multiple dates for same paper-datatype in case we want to see multiple results later.  if it didn't and we didn't order it would take 2.05 vs 2.74 secs, so not worth changing the way we're storing data
 
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
    while (my @row = $result->fetchrow) {
 
      my $joinkey = $row[0]; my $svmdata = $row[3];
 
      next unless ($curatablePapers{$row[0]});
 
      $svmData{$datatype}{$joinkey} = $svmdata; } }
 
} # sub populateSvmData
 
</pre>
 
  
  
=== Loading OA data ===
+
=== Loading curatable papers ===
  
Populate %objsCurated and %oaData hashes.
+
Only papers that have a 'valid' pap_status value and a 'primary' pap_primary_data value are considered curatable. These are stored in the %curatablePapers hash. ( paperID => status )
 
 
Each datatype is stored in different tables and has to be queried separately.  The queries are mostly the same.
 
  
 
<pre>
 
<pre>
  if ($chosenDatatypes{'newmutant'}) {
+
sub populateCuratablePapers {
    $result = $dbh->prepare( "SELECT * FROM app_variation" );
+
  my $query = "SELECT * FROM pap_status WHERE pap_status = 'valid' AND joinkey IN (SELECT joinkey FROM pap_primary_data WHERE pap_primary_data = 'primary')";
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
  $result = $dbh->prepare( $query );
    while (my @row = $result->fetchrow) { $objsCurated{'newmutant'}{$row[1]}++; }
+
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
    $result = $dbh->prepare( "SELECT * FROM app_paper" );
+
  while (my @row = $result->fetchrow) { $curatablePapers{$row[0]} = $row[1]; }
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
} # sub populateCuratablePapers
    while (my @row = $result->fetchrow) {
 
      my (@papers) = $row[1] =~ m/WBPaper(\d+)/g;
 
      foreach my $paper (@papers) {
 
        $oaData{'newmutant'}{$paper} = 'curated'; } } }
 
 
</pre>
 
</pre>
and similarly for other datatypes.
 
  
The example above is for the datatype 'newmutant'.  If that datatype is a %chosenDatatypes, query app_variation and store in %objsCurated ( datatype, object => 1 ), then query app_paper matching for WBPaper IDs, and associating to %oaData ( datatype, paperID => 'curated' ).
 
  
For other datatypes :
+
=== Loading afp_ data ===
* overexpr : objects from app_transgene ; %oaData from app_paper WHERE joinkey IN (SELECT joinkey FROM app_transgene WHERE app_transgene IS NOT NULL AND app_transgene != ''), meaning papers where the postgresID has a corresponding transgene that exists in app_transgene.
+
 
* antibody : objects from abp_name ; %oaData from abp_paper
+
Populate %afpEmailed, %afpData, %afpFlagged, %afpPos, %afpNeg.
* otherexpr : objects from exp_name ; %oaData from exp_paper
+
 
* genereg : objects from grg_name; %oaData from grg_paper
+
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding afp_ postgres table, and if it's a curatable paper store the value in the %afpData hash (datatype, paper ID => AFP result).
* geneint : objects from int_name; %oaData from int_paper
 
* rnai : objects from rna_name; %oaData from rna_paper
 
* blastomere : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation')
 
* exprmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic')
 
* geneticablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation')
 
* geneticmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic')
 
* laserablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation')
 
* chemicals :
 
** do 5 postgres queries to find unique curated objects into %objsCurated{chemicals}{<chemical>}
 
<pre>
 
- SELECT * FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_paper WHERE mop_paper IS NOT NULL AND mop_paper != '')
 
- SELECT * FROM grg_moleculeregulator match for WBMol:\d+
 
- SELECT * FROM app_molecule match for WBMol:\d+
 
- SELECT * FROM pro_molecule match for WBMol:\d+
 
- SELECT * FROM rna_molecule match for WBMol:\d+
 
</pre>
 
** do 7 postgres queries to find unique curated papers and match for WBPaper\d+ into $oaData{'chemicals'}{<paper>} = 'curated' :
 
<pre>
 
- SELECT * FROM mop_paper
 
- SELECT * FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_molecule WHERE app_molecule IS NOT NULL AND app_molecule != '')
 
- SELECT * FROM grg_paper WHERE joinkey IN (SELECT joinkey FROM grg_moleculeregulator WHERE grg_moleculeregulator IS NOT NULL AND grg_moleculeregulator != '')
 
- SELECT * FROM pro_paper WHERE joinkey IN (SELECT joinkey FROM pro_molecule WHERE pro_molecule IS NOT NULL AND pro_molecule != '')
 
- SELECT * FROM rna_paper WHERE joinkey IN (SELECT joinkey FROM rna_molecule WHERE rna_molecule IS NOT NULL AND rna_molecule != '')
 
- SELECT * FROM int_paper WHERE joinkey IN (SELECT joinkey FROM int_otheronetype WHERE int_otheronetype = 'Chemical')
 
- SELECT * FROM int_paper WHERE joinkey IN (SELECT joinkey FROM int_othertwotype WHERE int_othertwotype = 'Chemical')
 
</pre>
 
  
=== Loading cur_curdata ===
+
Query afp_email and if it's a curatable paper store in %afpEmailed hash ( paperID => 1 ) for afp emailed statistics.
  
cur_curdata: this captures all data entered through this form, meaning paper ID, datatype, curator ID, validation status (e.g. "curated and positive"), pre-canned comment, and/or free text comment (and timestamp). Note: this table only stores data (and associated paper-datatype pairs) that has been manually entered through this form.
+
Query afp_lasttouched to see if a paper has been flagged for afp. Skip if it's not a curatable paper. For all %chosenDatatypes store in %afpFlagged ( datatype, paperID => 1 )
  
Code:
+
For each of the %afpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %afpData value, store in %afpPos hash ( positive flag for afp ), otherwise store in %afpNeg hash (negative flag for afp )  ( datatype, paperID => 1 )
  
 
<pre>
 
<pre>
sub populateCurCurData {
+
sub populateAfpData {
   $result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" );       # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest
+
   foreach my $datatype (sort keys %chosenDatatypes) {
 +
    next unless $datatypesAfpCfp{$datatype};
 +
    my $pgtable_datatype = $datatypesAfpCfp{$datatype};
 +
    $result = $dbh->prepare( "SELECT * FROM afp_$pgtable_datatype" );
 +
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
    while (my @row = $result->fetchrow) {
 +
      next unless ($curatablePapers{$row[0]});
 +
      $afpData{$datatype}{$row[0]} = $row[1]; }
 +
  } # foreach my $datatype (sort keys %chosenDatatypes)
 +
 
 +
  $result = $dbh->prepare( "SELECT * FROM afp_email" );
 +
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
  while (my @row = $result->fetchrow) {
 +
    next unless ($curatablePapers{$row[0]});
 +
    $afpEmailed{$row[0]}++; }
 +
  $result = $dbh->prepare( "SELECT * FROM afp_lasttouched" );
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   while (my @row = $result->fetchrow) {
 
   while (my @row = $result->fetchrow) {
     next unless ($chosenPapers{$row[0]} || $chosenPapers{all});
+
     next unless ($curatablePapers{$row[0]});
     next unless ($chosenDatatypes{$row[1]});
+
     foreach my $datatype (sort keys %chosenDatatypes) {
    $curData{$row[1]}{$row[0]}{curator}   = $row[2];
+
      $afpFlagged{$datatype}{$row[0]}++; } }
     $curData{$row[1]}{$row[0]}{donposneg}  = $row[3];
+
  foreach my $datatype (sort keys %chosenDatatypes) {
    $curData{$row[1]}{$row[0]}{selcomment} = $row[4];
+
     foreach my $joinkey (sort keys %{ $afpFlagged{$datatype} }) {
    $curData{$row[1]}{$row[0]}{txtcomment} = $row[5];
+
      if ($afpData{$datatype}{$joinkey}) { $afpPos{$datatype}{$joinkey}++; }
    $curData{$row[1]}{$row[0]}{timestamp} = $row[6]; }
+
        else { $afpNeg{$datatype}{$joinkey}++; } } }
} # sub populateCurCurData
+
} # sub populateAfpData
 
</pre>
 
</pre>
  
  
When populating curator data from curation status, read the cur_curdata postgres table, skip datatypes that were not chosen, skip papers that were not chosen.
+
=== Loading cfp_ data ===
Store data in the %curData hash, key is datatype, subkey is paperID, then valuekeys are curator, donposneg (curator result of curated, validatedPos, validatedNeg, notValidated), select comment, text comment, timestamp.
 
  
cur_curdata can only have one result for a specific paper-datatype pair, if a new result is entered it will overwrite the previous result.
+
Populate %cfpData, %cfpFlagged, %cfpPos, %cfpNeg.
  
 +
for each of the chosen datatypes, if they are allowed in %datatypeAfpCfp, query the corresponding cfp_ postgres table, and if it's a curatable paper store the value in the %cfpData hash (datatype, paper ID => CFP result).
  
==== Loading cur_curdata --- Code changes for June 12, 2013 ====
+
Query cfp_curator to see if a paper has been flagged for cfp.  Skip if it's not a curatable paper.  For all %chosenDatatypes store in %cfpFlagged ( datatype, paperID => 1 )
  
The code above was changed to accommodate the change in how we handle "not validated" flags. The code has one extra line:
+
For each of the %cfpFlagged datatypes that have been chosen (%chosenDatatypes), if there is an %cfpData value, store in %cfpPos hash ( positive flag for cfp ), otherwise store in %cfpNeg hash (negative flag for cfp )  ( datatype, paperID => 1 )
  
 
<pre>
 
<pre>
     next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') );                                         # skip entries marked as notvalidated
+
sub populateCfpData {
</pre>
+
  foreach my $datatype (sort keys %chosenDatatypes) {
 +
     next unless $datatypesAfpCfp{$datatype};
 +
    my $pgtable_datatype = $datatypesAfpCfp{$datatype};
 +
    $result = $dbh->prepare( "SELECT * FROM cfp_$pgtable_datatype" );
 +
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
    while (my @row = $result->fetchrow) {
 +
      next unless ($curatablePapers{$row[0]});
 +
      $cfpData{$datatype}{$row[0]} = $row[1]; }
 +
  } # foreach my $datatype (sort keys %chosenDatatypes)
  
The new code, in total, is:
+
   $result = $dbh->prepare( "SELECT * FROM cfp_curator" );
 
 
<pre>
 
sub populateCurCurData {
 
   $result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" );       # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest
 
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 
   while (my @row = $result->fetchrow) {
 
   while (my @row = $result->fetchrow) {
     next unless ($chosenPapers{$row[0]} || $chosenPapers{all});
+
     next unless ($curatablePapers{$row[0]});
     next unless ($chosenDatatypes{$row[1]});
+
     foreach my $datatype (sort keys %chosenDatatypes) {
    next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') );                                        # skip entries marked as notvalidated
+
      $cfpFlagged{$datatype}{$row[0]}++; } }
    $curData{$row[1]}{$row[0]}{curator}   = $row[2];
+
  foreach my $datatype (sort keys %chosenDatatypes) {
     $curData{$row[1]}{$row[0]}{donposneg}  = $row[3];
+
     foreach my $joinkey (sort keys %{ $cfpFlagged{$datatype} }) {
    $curData{$row[1]}{$row[0]}{selcomment} = $row[4];
+
      if ($cfpData{$datatype}{$joinkey}) { $cfpPos{$datatype}{$joinkey}++; }
    $curData{$row[1]}{$row[0]}{txtcomment} = $row[5];
+
        else { $cfpNeg{$datatype}{$joinkey}++; } } }
    $curData{$row[1]}{$row[0]}{timestamp} = $row[6]; }
+
} # sub populateCfpData
} # sub populateCurCurData
 
 
</pre>
 
</pre>
  
=== Processing curated data ===
+
=== Loading str data ===
 +
Populate %strData, %strFlagged, %strPos, %strNeg.
  
The following subroutine will process cur_curdata and oaData into %valCur %valPos %valNeg and into %conflict which has the paper-datatypes that have multiple values, which correspond to a datatype-paper pair's validated+curated, validated+positive, validated+negative.
+
For each of the chosen datatypes, query the cur_strdata table where cur_datatype is that datatype. The string result is the 4th column, the paper ID is the first column. skip papers that are not %curatablePapers. store in %strData ( datatype, paper => str_result ). cur_strdata only has a single result for a given paper-datatype pair, replaced as a whole each time the cronjob runs.
  
If a paper has been curated for a datatype, the paper enters into the %valCur '''AND''' the %valPos hashes; if it has been validated positive but NOT curated it goes into %valPos ONLY; and if it has been validated negative it will go into %valNeg.
+
All curatable papers are considered flagged for STR. It's positive if it's in cur_strdata, and negative if it's not.
  
 
<pre>
 
<pre>
sub populateCuratedPapers {
+
   $result = $dbh->prepare( "SELECT * FROM cur_strdata" );           # only single value ever stored per paper
   my ($showTimes, $start, $end, $diff) = (0, '', '', '');
+
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
   if ($showTimes) { $start = time; }
+
   while (my @row = $result->fetchrow) {
   &populateCurCurData();
+
    my $joinkey = $row[0]; my $datatype = $row[1]; my $strdata = $row[3];
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  populateCurCurData $diff<br>"; }
+
    next unless ($curatablePapers{$row[0]});
  &populateOa();                                                # $oaData{datatype}{joinkey} = 'positive';
+
     $strData{$datatype}{$joinkey} = $strdata; }
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  populateOa $diff<br>"; }
+
   foreach my $datatype (sort keys %strData) {
  my %allCuratorValues;                # $allCuratorValues{datatype}{joinkey} = 0 | 1+
+
     foreach my $joinkey (sort keys %curatablePapers) {
  foreach my $datatype (sort keys %oaData) {
+
       $strFlagged{$datatype}{$joinkey}++;
     foreach my $joinkey (sort keys %{ $oaData{$datatype} }) {
+
      if ($strData{$datatype}{$joinkey}) { $strPos{$datatype}{$joinkey}++; }
      $allCuratorValues{$joinkey}{$datatype}{curated}++; } }            # validated positive and curated
+
         else { $strNeg{$datatype}{$joinkey}++; } } }
   foreach my $datatype (sort keys %curData) {
+
 
     foreach my $joinkey (sort keys %{ $curData{$datatype} }) {
 
       $allCuratorValues{$joinkey}{$datatype}{ $curData{$datatype}{$joinkey}{donposneg} }++; } }
 
  foreach my $joinkey (sort keys %allCuratorValues) {
 
    next unless ($curatablePapers{$joinkey});                              # skips non-primary papers
 
    foreach my $datatype (sort keys %{ $allCuratorValues{$joinkey} }) {
 
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
 
      if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }
 
         else {
 
          my $value = shift @values;
 
          $validated{$datatype}{$joinkey} = $value;
 
          if ($value eq 'curated') {      $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
 
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
 
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }
 
  } } }
 
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  categorizing hash $diff<br>"; }
 
} # sub populateCuratedPapers
 
 
</pre>
 
</pre>
  
 +
=== Loading svm data ===
  
==== Processing curated data --- Code changes for June 12, 2013 ====
+
Populate %svmData hash.
  
The code in the section above was changed to accommodate a change in the way we recognize and handle paper conflicts. The following code:
+
For each of the chosen datatypes, query the cur_svmdata table where cur_datatype is that datatype, and sort by cur_date so that we always have the latest value for a given paper-datatype pair. The svm result is the 4th column, the paper ID is the first column.  skip papers that are not %curatablePapers.  store in %svmData ( datatype, paper => svm_result ).  cur_svmdata could have multiple results for a given paper-datatype pair, we'll consider only the most recent result (by the directory name/date on Yuling's machine).
  
 
<pre>
 
<pre>
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
+
sub populateSvmData {
       if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }
+
#    $result = $dbh->prepare( "SELECT * FROM cur_svmdata ORDER BY cur_datatype, cur_date" );  # always doing for all datatypes vs looping for chosen takes 4.66vs 2.74 secs
        else {
+
  foreach my $datatype (sort keys %chosenDatatypes) {
          my $value = shift @values;
+
    $result = $dbh->prepare( "SELECT * FROM cur_svmdata WHERE cur_datatype = '$datatype' ORDER BY cur_date" );
          $validated{$datatype}{$joinkey} = $value;
+
       # table stores multiple dates for same paper-datatype in case we want to see multiple results later.  if it didn't and we didn't order it would take 2.05 vs 2.74 secs, so not worth changing the way we're storing data
          if ($value eq 'curated') {      $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
+
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
+
    while (my @row = $result->fetchrow) {
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }
+
       my $joinkey = $row[0]; my $svmdata = $row[3];
 +
      next unless ($curatablePapers{$row[0]});
 +
      $svmData{$datatype}{$joinkey} = $svmdata; } }
 +
} # sub populateSvmData
 
</pre>
 
</pre>
  
was changed to:
+
 
 +
=== Loading OA data ===
 +
 
 +
Populate %objsCurated and %oaData hashes.
 +
 
 +
Each datatype is stored in different tables and has to be queried separately.  The queries are mostly the same.
  
 
<pre>
 
<pre>
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
+
  if ($chosenDatatypes{'newmutant'}) {
      if (scalar @values < 2) {                 # only one value, categorize it
+
    $result = $dbh->prepare( "SELECT * FROM app_variation" );
          my $value = shift @values;
+
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
          $validated{$datatype}{$joinkey} = $value;
+
    while (my @row = $result->fetchrow) { $objsCurated{'newmutant'}{$row[1]}++; }
          if ($value eq 'curated') {       $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
+
    $result = $dbh->prepare( "SELECT * FROM app_paper" );
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
+
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; } }
+
    while (my @row = $result->fetchrow) {
        elsif (scalar @values == 2) {          # only two values, either ok or conflict
+
      my (@papers) = $row[1] =~ m/WBPaper(\d+)/g;
            if ( ($allCuratorValues{$joinkey}{$datatype}{'curated'}) && ($allCuratorValues{$joinkey}{$datatype}{'positive'}) ) {       # positive + curated not a conflict, for Chris 2013 06 12
+
      foreach my $paper (@papers) {
                $valPos{$datatype}{$joinkey} = 'positive'; $valCur{$datatype}{$joinkey} = 'curated'; }
+
        $oaData{'newmutant'}{$paper} = 'curated'; } } }
              else { $conflict{$datatype}{$joinkey}++; } }
 
        else { $conflict{$datatype}{$joinkey}++; }
 
 
 
 
</pre>
 
</pre>
 +
and similarly for other datatypes.
  
=== Curation Statistics Calculations ===
+
The example above is for the datatype 'newmutant'.  If that datatype is a %chosenDatatypes, query app_variation and store in %objsCurated ( datatype, object => 1 ), then query app_paper matching for WBPaper IDs, and associating to %oaData ( datatype, paperID => 'curated' ).
  
The way that each value is calculated for Curation Statistics table is based on what papers (or, more specifically, paper IDs) populate each of a number of tables. The following hash tables capture validation status:
+
For other datatypes :
 
+
* overexpr : objects from app_transgene ; %oaData from app_paper WHERE joinkey IN (SELECT joinkey FROM app_transgene WHERE app_transgene IS NOT NULL AND app_transgene != ''), meaning papers where the postgresID has a corresponding transgene that exists in app_transgene.
<pre>
+
* antibody : objects from abp_name ; %oaData from abp_paper
%valCur - All papers that have been curated for a given datatype
+
* seqfeature : objects from sqf_name ; %oaData from sqf_paper
 
+
* humandisease : objects from dis_wbgene ; %oaData from dis_paperdisrel and dis_paperexpmod
%valPos - All papers that have been validated positive for a given datatype, but not yet curated
+
* otherexpr : objects from exp_name ; %oaData from exp_paper
 
+
* genereg : objects from grg_name; %oaData from grg_paper
%valNeg - All papers that have been validated negative for a given datatype
+
* geneint : objects from int_name; %oaData from int_paper
 +
* rnai : objects from rna_name; %oaData from rna_paper
 +
* blastomere : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Blastomere_isolation')
 +
* exprmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Expression_mosaic')
 +
* geneticablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_ablation')
 +
* geneticmosaic : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Genetic_mosaic')
 +
* laserablation : objects from wbb_wbbtf WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation') ; %oaData from wbb_reference WHERE joinkey IN (SELECT joinkey FROM wbb_assay WHERE wbb_assay = 'Laser_ablation')
 +
* chemicals :
 +
** do 5 postgres queries to find unique curated objects into %objsCurated{chemicals}{<chemical>}
 +
<pre>
 +
- SELECT * FROM mop_name WHERE joinkey IN (SELECT joinkey FROM mop_paper WHERE mop_paper IS NOT NULL AND mop_paper != '')
 +
- SELECT * FROM grg_moleculeregulator match for WBMol:\d+
 +
- SELECT * FROM app_molecule match for WBMol:\d+
 +
- SELECT * FROM pro_molecule match for WBMol:\d+
 +
- SELECT * FROM rna_molecule match for WBMol:\d+
 
</pre>
 
</pre>
 
+
** do 7 postgres queries to find unique curated papers and match for WBPaper\d+ into $oaData{'chemicals'}{<paper>} = 'curated' :
 
 
When determining, for a particular flagging method, the validation and curation statistics with respect to flagging status, these tables are compared to the table for flagging results to generate the numbers for the Curation Statistics table. So, for AFP Positives for example, the following logic is performed to determine the indicated values (list of papers), per datatype:
 
 
 
 
<pre>
 
<pre>
AFP positive (%afpPos)
+
- SELECT * FROM mop_paper
AFP positive validated (%afpPosVal)                            : %afpPos AND (%valNeg OR %valPos)
+
- SELECT * FROM app_paper WHERE joinkey IN (SELECT joinkey FROM app_molecule WHERE app_molecule IS NOT NULL AND app_molecule != '')
AFP positive validated false positive (%afpPosFP)               : %afpPos AND %valNeg
+
- SELECT * FROM grg_paper WHERE joinkey IN (SELECT joinkey FROM grg_moleculeregulator WHERE grg_moleculeregulator IS NOT NULL AND grg_moleculeregulator != '')
AFP positive validated true positive (%afpPosTP)                : %afpPos AND %valPos
+
- SELECT * FROM pro_paper WHERE joinkey IN (SELECT joinkey FROM pro_molecule WHERE pro_molecule IS NOT NULL AND pro_molecule != '')
AFP positive validated true positive curated (%afpPosTpCur)     : %afpPos AND %valPos AND %valCur    <Note: the %valPOS is redundant>
+
- SELECT * FROM rna_paper WHERE joinkey IN (SELECT joinkey FROM rna_molecule WHERE rna_molecule IS NOT NULL AND rna_molecule != '')
AFP positive validated true positive not curated (%afpPosTpNC)  : %afpPos AND (%valPos NOT %valCur)
+
- SELECT * FROM int_paper WHERE joinkey IN (SELECT joinkey FROM int_moleculeone WHERE int_moleculeone IS NOT NULL) OR joinkey IN (SELECT joinkey FROM int_moleculetwo WHERE int_moleculetwo IS NOT NULL) OR joinkey IN (SELECT joinkey FROM int_moleculenondir WHERE int_moleculenondir IS NOT NULL)
AFP positive not validated (%afpPosNV)                         : %afpPos NOT (%valNeg OR %valPos)
 
AFP positive not curated (%afpPosNC)                            : (%afpPos AND (%valPos NOT %valCur)) OR (%afpPos NOT (%valNeg OR %valPos))
 
 
</pre>
 
</pre>
  
 +
=== Loading cur_curdata ===
  
which are determined by the following section of code:
+
cur_curdata: this captures all data entered through this form, meaning paper ID, datatype, curator ID, validation status (e.g. "curated and positive"), pre-canned comment, and/or free text comment (and timestamp). Note: this table only stores data (and associated paper-datatype pairs) that has been manually entered through this form.
 +
 
 +
Code:
  
 
<pre>
 
<pre>
sub getCurationStatisticsAfpPos {
+
sub populateCurCurData {
   my ($datatypesToShow_ref) = @_;
+
   $result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" );       # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest
  my @datatypesToShow = @$datatypesToShow_ref;
+
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
  my %afpPosNV; my %afpPosVal; my %afpPosFP; my %afpPosTP; my %afpPosTpCur; my %afpPosTpNC; my %afpPosNC;
+
  while (my @row = $result->fetchrow) {
        # positive and : not validated, validated, false positive, true positive, TP curated, TP not curated, not curated minus validated negative OR not validated + TP not curated
+
    next unless ($chosenPapers{$row[0]} || $chosenPapers{all});
   foreach my $datatype (@datatypesToShow) {
+
     next unless ($chosenDatatypes{$row[1]});
    foreach my $joinkey (sort keys %{ $afpPos{$datatype} }) {
+
    $curData{$row[1]}{$row[0]}{curator}   = $row[2];
      if ($valPos{$datatype}{$joinkey}) {      $afpPosTP{$datatype}{$joinkey}++;    $afpPosVal{$datatype}{$joinkey}++;
+
    $curData{$row[1]}{$row[0]}{donposneg} = $row[3];
          if ($valCur{$datatype}{$joinkey}) {  $afpPosTpCur{$datatype}{$joinkey}++; }
+
     $curData{$row[1]}{$row[0]}{selcomment} = $row[4];
            else                            {  $afpPosTpNC{$datatype}{$joinkey}++;  $afpPosNC{$datatype}{$joinkey}++; } }
+
     $curData{$row[1]}{$row[0]}{txtcomment} = $row[5];
        elsif ($valNeg{$datatype}{$joinkey}) { $afpPosFP{$datatype}{$joinkey}++;     $afpPosVal{$datatype}{$joinkey}++; }
+
     $curData{$row[1]}{$row[0]}{timestamp} = $row[6]; }
        else {                                $afpPosNV{$datatype}{$joinkey}++;     $afpPosNC{$datatype}{$joinkey}++; } } }
+
} # sub populateCurCurData
  tie %{ $curStats{'afp'}{'pos'} }, "Tie::IxHash";
+
</pre>
  foreach my $datatype (@datatypesToShow) {
 
    my $countAfpFlagged  = scalar keys %{ $afpFlagged{$datatype} };
 
     my $countAfpPos = scalar keys %{ $afpPos{$datatype} };
 
    my $ratio = 0;
 
    if ($countAfpFlagged > 0) { $ratio = $countAfpPos / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); }
 
     foreach my $joinkey (keys %{ $afpPos{$datatype} }) {     $curStats{'afp'}{'pos'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{$datatype}{'countPap'}                      = scalar keys %{ $afpPos{$datatype} };
 
    $curStats{'afp'}{'pos'}{$datatype}{'ratio'}                        = $ratio;
 
  
    my $countAfpPosVal  = scalar keys %{ $afpPosVal{$datatype} };
 
    $ratio = 0;
 
    if ($countAfpPos > 0) { $ratio = $countAfpPosVal / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosVal{$datatype} }) {  $curStats{'afp'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'val'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosVal{$datatype} };
 
    $curStats{'afp'}{'pos'}{'val'}{$datatype}{'ratio'}                  = $ratio;
 
  
    my $countAfpPosTP  = scalar keys %{ $afpPosTP{$datatype} };
+
When populating curator data from curation status, read the cur_curdata postgres table, skip datatypes that were not chosen, skip papers that were not chosen.
    $ratio = 0;
+
Store data in the %curData hash, key is datatype, subkey is paperID, then valuekeys are curator, donposneg (curator result of curated, validatedPos, validatedNeg, notValidated), select comment, text comment, timestamp.
    if ($countAfpPosVal > 0) { $ratio = $countAfpPosTP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosTP{$datatype} }) {    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'countPap'}        = scalar keys %{ $afpPosTP{$datatype} };
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'ratio'}            = $ratio;
 
  
    my $countAfpPosTpCur  = scalar keys %{ $afpPosTpCur{$datatype} };
+
cur_curdata can only have one result for a specific paper-datatype pair, if a new result is entered it will overwrite the previous result.
    $ratio = 0;
 
    if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpCur / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosTpCur{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'countPap'}  = scalar keys %{ $afpPosTpCur{$datatype} };
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpPosTpNC  = scalar keys %{ $afpPosTpNC{$datatype} };
 
    $ratio = 0;
 
    if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpNC / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosTpNC{$datatype} }) {  $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpPosTpNC{$datatype} };
 
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpPosFP  = scalar keys %{ $afpPosFP{$datatype} };
+
==== Loading cur_curdata --- Code changes for June 12, 2013 ====
    $ratio = 0;
 
    if ($countAfpPosVal > 0) { $ratio = $countAfpPosFP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosFP{$datatype} }) {    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'countPap'}        = scalar keys %{ $afpPosFP{$datatype} };
 
    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'ratio'}            = $ratio;
 
  
    my $countAfpPosNV  = scalar keys %{ $afpPosNV{$datatype} };
+
The code above was changed to accommodate the change in how we handle "not validated" flags. The code has one extra line:
    $ratio = 0;
 
    if ($countAfpPos > 0) { $ratio = $countAfpPosNV / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosNV{$datatype} }) {    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosNV{$datatype} };
 
    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{'ratio'}                = $ratio;
 
  
    my $countAfpPosNC  = scalar keys %{ $afpPosNC{$datatype} };
+
<pre>
    $ratio = 0;
+
     next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') );                                         # skip entries marked as notvalidated
     if ($countAfpPos > 0) { $ratio = $countAfpPosNC / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpPosNC{$datatype} }) {    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++;
 
                                                              $curStats{'any'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosNC{$datatype} };
 
    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'ratio'}                = $ratio;
 
  } # foreach my $datatype (@datatypesToShow)
 
} # sub getCurationStatisticsAfpPos
 
 
</pre>
 
</pre>
  
 
+
The new code, in total, is:
For AFP Negatives, the following logic is performed to determine the indicated values (list of papers), per datatype:
 
  
 
<pre>
 
<pre>
AFP negative (%afpNeg)
+
sub populateCurCurData {
AFP negative validated (%afpNegVal)                             : %afpNeg AND (%valNeg OR %valPos)
+
  $result = $dbh->prepare( "SELECT * FROM cur_curdata ORDER BY cur_timestamp" );        # in case multiple values get in for a paper-datatype (shouldn't happen), keep the latest
AFP negative validated true negative (%afpNegTN)               : %afpNeg AND %valNeg
+
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
AFP negative validated false negative (%afpNegFN)              : %afpNeg AND %valPos
+
  while (my @row = $result->fetchrow) {
AFP negative validated false negative curated (%afpNegFnCur)   : %afpNeg AND %valPos AND %valCur    <Note: the %valPOS is redundant>
+
    next unless ($chosenPapers{$row[0]} || $chosenPapers{all});
AFP negative validated false negative not curated (%afpNegFnNC) : %afpNeg AND %valPos NOT %valCur
+
    next unless ($chosenDatatypes{$row[1]});
AFP negative not validated (%afpNegNV)                          : %afpNeg NOT (%valNeg OR %valPos)
+
    next if ( ($row[3] eq 'notvalidated') || ($row[3] eq '') );                                        # skip entries marked as notvalidated
AFP negative not curated (%afpNegNC)                            : (%afpNeg AND (%valPos NOT %valCur)) OR (%afpNeg NOT (%valPos OR %valNeg))
+
    $curData{$row[1]}{$row[0]}{curator}    = $row[2];
 +
    $curData{$row[1]}{$row[0]}{donposneg}  = $row[3];
 +
    $curData{$row[1]}{$row[0]}{selcomment} = $row[4];
 +
    $curData{$row[1]}{$row[0]}{txtcomment} = $row[5];
 +
    $curData{$row[1]}{$row[0]}{timestamp}  = $row[6]; }
 +
} # sub populateCurCurData
 
</pre>
 
</pre>
  
which are determined by the following section of code:
+
==== Loading cur_curdata --- Code Changes for May 12th, 2017 ====
 +
*Updates were made to the cur_curdata table to accommodate entry of parasite paper data
 +
*A new column in cur_curdata was added that records for which group a data type was curated
 +
**The column is cur_site and the values are: caltech or parasite
 +
**A paper can be flagged for the same data type for each site; if flagged by more than one site, the form will create separate entries in a row in the table
 +
 
 +
=== Processing curated data ===
 +
 
 +
The following subroutine will process cur_curdata and oaData into %valCur %valPos %valNeg and into %conflict which has the paper-datatypes that have multiple values, which correspond to a datatype-paper pair's validated+curated, validated+positive, validated+negative.
 +
 
 +
If a paper has been curated for a datatype, the paper enters into the %valCur '''AND''' the %valPos hashes; if it has been validated positive but NOT curated it goes into %valPos ONLY; and if it has been validated negative it will go into %valNeg.
  
 
<pre>
 
<pre>
sub getCurationStatisticsAfpNeg {
+
sub populateCuratedPapers {
   my ($datatypesToShow_ref) = @_;
+
   my ($showTimes, $start, $end, $diff) = (0, '', '', '');
   my @datatypesToShow = @$datatypesToShow_ref;
+
  if ($showTimes) { $start = time; }
   my %afpNegNV; my %afpNegVal; my %afpNegTN; my %afpNegFN; my %afpNegFnCur; my %afpNegFnNC; my %afpNegNC;
+
  &populateCurCurData();
        # negative and : not validated, validated, true negative, false negative, FN curated, FN not curated, not curated minus validated negative OR not validated + FN not curated
+
   if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  populateCurCurData $diff<br>"; }
   foreach my $datatype (@datatypesToShow) {
+
   &populateOa();                                               # $oaData{datatype}{joinkey} = 'positive';
     foreach my $joinkey (sort keys %{ $afpNeg{$datatype} }) {
+
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  populateOa $diff<br>"; }
       if ($valNeg{$datatype}{$joinkey}) {      $afpNegTN{$datatype}{$joinkey}++;   $afpNegVal{$datatype}{$joinkey}++; }
+
  my %allCuratorValues;                 # $allCuratorValues{datatype}{joinkey} = 0 | 1+
        elsif ($valPos{$datatype}{$joinkey}) { $afpNegFN{$datatype}{$joinkey}++;    $afpNegVal{$datatype}{$joinkey}++;
+
   foreach my $datatype (sort keys %oaData) {
          if ($valCur{$datatype}{$joinkey}) {  $afpNegFnCur{$datatype}{$joinkey}++; }
+
     foreach my $joinkey (sort keys %{ $oaData{$datatype} }) {
            else                            {  $afpNegFnNC{$datatype}{$joinkey}++; $afpNegNC{$datatype}{$joinkey}++; } }
+
       $allCuratorValues{$joinkey}{$datatype}{curated}++; } }            # validated positive and curated
        else {                                 $afpNegNV{$datatype}{$joinkey}++;   $afpNegNC{$datatype}{$joinkey}++; } } }
+
  foreach my $datatype (sort keys %curData) {
  tie %{ $curStats{'afp'}{'neg'} }, "Tie::IxHash";
+
    foreach my $joinkey (sort keys %{ $curData{$datatype} }) {
  foreach my $datatype (@datatypesToShow) {
+
      $allCuratorValues{$joinkey}{$datatype}{ $curData{$datatype}{$joinkey}{donposneg} }++; } }
    my $countAfpFlagged  = scalar keys %{ $afpFlagged{$datatype} };
+
  foreach my $joinkey (sort keys %allCuratorValues) {
    my $countAfpNeg = scalar keys %{ $afpNeg{$datatype} };
+
    next unless ($curatablePapers{$joinkey});                             # skips non-primary papers
    my $ratio = 0;
+
    foreach my $datatype (sort keys %{ $allCuratorValues{$joinkey} }) {
    if ($countAfpFlagged > 0) { $ratio = $countAfpNeg / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); }    # ($ratio) = &roundToPlaces($ratio, 2); # $ratio = sprintf "%.2f", $ratio;
+
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
    foreach my $joinkey (keys %{ $afpNeg{$datatype} }) { $curStats{'afp'}{'neg'}{$datatype}{papers}{$joinkey}++; }
+
      if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }
    $curStats{'afp'}{'neg'}{$datatype}{'countPap'} = scalar keys %{ $afpNeg{$datatype} };
+
        else {
    $curStats{'afp'}{'neg'}{$datatype}{'ratio'}    = $ratio;
+
          my $value = shift @values;
 +
          $validated{$datatype}{$joinkey} = $value;
 +
          if ($value eq 'curated') {       $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
 +
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
 +
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }
 +
  } } }
 +
  if ($showTimes) { $end = time; $diff = $end - $start; $start = time; print "IN populateCuratedPapers  categorizing hash $diff<br>"; }
 +
} # sub populateCuratedPapers
 +
</pre>
  
    my $countAfpNegVal  = scalar keys %{ $afpNegVal{$datatype} };
 
    $ratio = 0;
 
    if ($countAfpNeg > 0) { $ratio = $countAfpNegVal / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpNegVal{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'neg'}{'val'}{$datatype}{'countPap'} = scalar keys %{ $afpNegVal{$datatype} };
 
    $curStats{'afp'}{'neg'}{'val'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpNegTN  = scalar keys %{ $afpNegTN{$datatype} };
+
==== Processing curated data --- Code changes for June 12, 2013 ====
    $ratio = 0;
 
    if ($countAfpNegVal > 0) { $ratio = $countAfpNegTN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpNegTN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegTN{$datatype} };
 
    $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpNegFN  = scalar keys %{ $afpNegFN{$datatype} };
+
The code in the section above was changed to accommodate a change in the way we recognize and handle paper conflicts. The following code:
    $ratio = 0;
 
    if ($countAfpNegVal > 0) { $ratio = $countAfpNegFN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpNegFN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFN{$datatype} };
 
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpNegFnCur  = scalar keys %{ $afpNegFnCur{$datatype} };
+
<pre>
    $ratio = 0;
+
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
    if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnCur / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }
+
      if (scalar @values > 1) { $conflict{$datatype}{$joinkey}++; }
    foreach my $joinkey (keys %{ $afpNegFnCur{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{papers}{$joinkey}++; }
+
        else {
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnCur{$datatype} };
+
          my $value = shift @values;
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'ratio'}   = $ratio;
+
          $validated{$datatype}{$joinkey} = $value;
 +
          if ($value eq 'curated') {       $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
 +
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
 +
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; }
 +
</pre>
  
    my $countAfpNegFnNC  = scalar keys %{ $afpNegFnNC{$datatype} };
+
was changed to:
    $ratio = 0;
 
    if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnNC / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }
 
    foreach my $joinkey (keys %{ $afpNegFnNC{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnNC{$datatype} };
 
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'ratio'}    = $ratio;
 
  
    my $countAfpNegNV  = scalar keys %{ $afpNegNV{$datatype} };
+
<pre>
    $ratio = 0;
+
      my @values = keys %{ $allCuratorValues{$joinkey}{$datatype} };
    if ($countAfpNeg > 0) { $ratio = $countAfpNegNV / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
+
      if (scalar @values < 2) {                 # only one value, categorize it
    foreach my $joinkey (keys %{ $afpNegNV{$datatype} }) { $curStats{'afp'}{'neg'}{'nval'}{$datatype}{papers}{$joinkey}++; }
+
          my $value = shift @values;
    $curStats{'afp'}{'neg'}{'nval'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNV{$datatype} };
+
          $validated{$datatype}{$joinkey} = $value;
    $curStats{'afp'}{'neg'}{'nval'}{$datatype}{'ratio'}   = $ratio;
+
          if ($value eq 'curated') {       $valPos{$datatype}{$joinkey} = $value; $valCur{$datatype}{$joinkey} = $value; }
 
+
            elsif ($value eq 'positive') { $valPos{$datatype}{$joinkey} = $value; }
    my $countAfpNegNC  = scalar keys %{ $afpNegNC{$datatype} };
+
            elsif ($value eq 'negative') { $valNeg{$datatype}{$joinkey} = $value; } }
    $ratio = 0;
+
        elsif (scalar @values == 2) {          # only two values, either ok or conflict
    if ($countAfpNeg > 0) { $ratio = $countAfpNegNC / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
+
            if ( ($allCuratorValues{$joinkey}{$datatype}{'curated'}) && ($allCuratorValues{$joinkey}{$datatype}{'positive'}) ) {       # positive + curated not a conflict, for Chris 2013 06 12
    foreach my $joinkey (keys %{ $afpNegNC{$datatype} }) { $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
+
                $valPos{$datatype}{$joinkey} = 'positive'; $valCur{$datatype}{$joinkey} = 'curated'; }
    $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNC{$datatype} };
+
              else { $conflict{$datatype}{$joinkey}++; } }
    $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'ratio'}   = $ratio;
+
        else { $conflict{$datatype}{$joinkey}++; }
  } # foreach my $datatype (@datatypesToShow)
 
} # sub getCurationStatisticsAfpNeg
 
  
 
</pre>
 
</pre>
  
 +
=== Curation Statistics Calculations ===
  
'''"Any" and "Intersection" rows of the Curation Statistics table'''
+
The way that each value is calculated for Curation Statistics table is based on what papers (or, more specifically, paper IDs) populate each of a number of tables. The following hash tables capture validation status:
  
To determine the "Any" and "Intersection" results, all flagging methods currently visible in the Curation Statistics table are considered. So, for the main Curation Statistics table (with no options selected), all flagging methods (SVM, AFP, and CFP as of 12-10-2012) are considered. The calculations in this case would be:
+
<pre>
 +
%valCur - All papers that have been curated for a given datatype
  
<pre>
+
%valPos - All papers that have been validated positive for a given datatype, but not yet curated
Any flagged                                                    : %svmData    OR %afpFlagged  OR %cfpFlagged
 
Any positive                                                    : %svmPos      OR %afpPos      OR %cfpPos
 
Any positive validated                                          : %svmPosVal  OR %afpPosVal  OR %cfpPosVal
 
Any positive validated false positive                          : %svmPosFP    OR %afpPosFP    OR %cfpPosFP
 
Any positive validated true positive                            : %svmPosTP    OR %afpPosTP    OR %cfpPosTP
 
Any positive validated true positive curated                    : %svmPosTpCur OR %afpPosTpCur OR %cfpPosTpCur
 
Any positive validated true positive not curated               : %svmPosTpNC  OR %afpPosTpNC  OR %cfpPosTpNC
 
Any positive not validated                                      : %svmPosNV    OR %afpPosNV    OR %cfpPosNV
 
Any positive not curated                                        : %svmPosNC    OR %afpPosNC    OR %cfpPosNC
 
  
Intersection flagged                                            : %svmData    AND %afpFlagged  AND %cfpFlagged
+
%valNeg - All papers that have been validated negative for a given datatype
Intersection positive                                          : %svmPos      AND %afpPos      AND %cfpPos
 
Intersection positive validated                                : %svmPosVal  AND %afpPosVal  AND %cfpPosVal
 
Intersection positive validated false positive                  : %svmPosFP    AND %afpPosFP    AND %cfpPosFP
 
Intersection positive validated true positive                  : %svmPosTP    AND %afpPosTP    AND %cfpPosTP
 
Intersection positive validated true positive curated          : %svmPosTpCur AND %afpPosTpCur AND %cfpPosTpCur
 
Intersection positive validated true positive not curated      : %svmPosTpNC  AND %afpPosTpNC  AND %cfpPosTpNC
 
Intersection positive not validated                            : %svmPosNV    AND %afpPosNV    AND %cfpPosNV
 
Intersection positive not curated                              : %svmPosNC    AND %afpPosNC    AND %cfpPosNC
 
 
</pre>
 
</pre>
  
  
Note that if a curator enters the Curation Statistics table after entering deselecting any of the flagging methods in the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the "Any" and "Intersection" sections of the table will only reflect the flagging methods chosen by the curator. Thus, if a curator chooses to view only one flagging method, the "Any", "Intersection", and "Flagged Positive" sections of the table will show identical results.
+
When determining, for a particular flagging method, the validation and curation statistics with respect to flagging status, these tables are compared to the table for flagging results to generate the numbers for the Curation Statistics table. So, for AFP Positives for example, the following logic is performed to determine the indicated values (list of papers), per datatype:
 
 
 
 
The following are the correspondences between rows in the Curation Statistics table and the hash tables in the form's code:
 
 
 
'''General paper stats'''
 
  
 
<pre>
 
<pre>
%curatablePapers                curatable papers
+
AFP positive (%afpPos)
%objsCurated                    objects curated
+
AFP positive validated (%afpPosVal)                            : %afpPos AND (%valNeg OR %valPos)
%objsCurated/%valCur            objects curated per paper
+
AFP positive validated false positive (%afpPosFP)              : %afpPos AND %valNeg
%valCur                        Papers curated
+
AFP positive validated true positive (%afpPosTP)                : %afpPos AND %valPos
%validated                      Papers validated
+
AFP positive validated true positive curated (%afpPosTpCur)    : %afpPos AND %valPos AND %valCur   <Note: the %valPOS is redundant>
%valPos                             Papers validated positive
+
AFP positive validated true positive not curated (%afpPosTpNC)  : %afpPos AND (%valPos NOT %valCur)
%valCur                                 Papers validated positive curated
+
AFP positive not validated (%afpPosNV)                          : %afpPos NOT (%valNeg OR %valPos)
%valPos NOT %valCur                     Papers validated positive not curated
+
AFP positive not curated (%afpPosNC)                            : (%afpPos AND (%valPos NOT %valCur)) OR (%afpPos NOT (%valNeg OR %valPos))
%valNeg                             Papers validated negative
 
%conflict                          Papers validated conflict
 
 
</pre>
 
</pre>
  
  
'''Support Vector Machine paper stats'''
+
which are determined by the following section of code:
  
 
<pre>
 
<pre>
%noSvm                  SVM no svm processed
+
sub getCurationStatisticsAfpPos {
%svmData                SVM has svm
+
  my ($datatypesToShow_ref) = @_;
%svmPos                    SVM positive any
+
  my @datatypesToShow = @$datatypesToShow_ref;
%svmPosVal                      SVM positive any validated
+
  my %afpPosNV; my %afpPosVal; my %afpPosFP; my %afpPosTP; my %afpPosTpCur; my %afpPosTpNC; my %afpPosNC;
%svmPosFP                          SVM positive any validated false positive
+
        # positive and : not validated, validated, false positive, true positive, TP curated, TP not curated, not curated minus validated negative OR not validated + TP not curated
%svmPosTP                          SVM positive any validated true positive
+
  foreach my $datatype (@datatypesToShow) {
%svmPosTpCur                            SVM positive any validated true positive curated
+
    foreach my $joinkey (sort keys %{ $afpPos{$datatype} }) {
%svmPosTpNC                            SVM positive any validated true positive not curated
+
      if ($valPos{$datatype}{$joinkey}) {      $afpPosTP{$datatype}{$joinkey}++;    $afpPosVal{$datatype}{$joinkey}++;
%svmPosNV                          SVM positive any not validated
+
          if ($valCur{$datatype}{$joinkey}) {  $afpPosTpCur{$datatype}{$joinkey}++; }
%svmPosNC                          SVM positive any not curated
+
            else                           {  $afpPosTpNC{$datatype}{$joinkey}++;  $afpPosNC{$datatype}{$joinkey}++; } }
%svmHig                     SVM positive high
+
        elsif ($valNeg{$datatype}{$joinkey}) { $afpPosFP{$datatype}{$joinkey}++;    $afpPosVal{$datatype}{$joinkey}++; }
%svmHigVal                      SVM positive high validated
+
        else {                                $afpPosNV{$datatype}{$joinkey}++;    $afpPosNC{$datatype}{$joinkey}++; } } }
%svmHigFP                          SVM positive high validated false positive
+
  tie %{ $curStats{'afp'}{'pos'} }, "Tie::IxHash";
%svmHigTP                          SVM positive high validated true positive
+
  foreach my $datatype (@datatypesToShow) {
%svmHigTpCur                           SVM positive high validated true positive curated
+
    my $countAfpFlagged  = scalar keys %{ $afpFlagged{$datatype} };
%svmHigTpNC                            SVM positive high validated true positive not curated
+
    my $countAfpPos = scalar keys %{ $afpPos{$datatype} };
%svmHigNV                          SVM positive high not validated
+
    my $ratio = 0;
%svmHigNC                          SVM positive high not curated
+
    if ($countAfpFlagged > 0) { $ratio = $countAfpPos / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); }
%svmMed                     SVM positive medium
+
    foreach my $joinkey (keys %{ $afpPos{$datatype} }) {      $curStats{'afp'}{'pos'}{$datatype}{papers}{$joinkey}++;
%svmMedVal                      SVM positive medium validated
+
                                                              $curStats{'any'}{'pos'}{$datatype}{papers}{$joinkey}++; }
%svmMedFP                          SVM positive medium validated false positive
+
    $curStats{'afp'}{'pos'}{$datatype}{'countPap'}                      = scalar keys %{ $afpPos{$datatype} };
%svmMedTP                          SVM positive medium validated true positive
+
    $curStats{'afp'}{'pos'}{$datatype}{'ratio'}                        = $ratio;
%svmMedTpCur                            SVM positive medium validated true positive curated
+
 
%svmMedTpNC                            SVM positive medium validated true positive not curated
+
    my $countAfpPosVal  = scalar keys %{ $afpPosVal{$datatype} };
%svmMedNV                          SVM positive medium not validated
+
    $ratio = 0;
%svmMedNC                          SVM positive medium not curated
+
    if ($countAfpPos > 0) { $ratio = $countAfpPosVal / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
%svmLow                     SVM positive low
+
    foreach my $joinkey (keys %{ $afpPosVal{$datatype} }) {  $curStats{'afp'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++;
%svmLowVal                      SVM positive low validated
+
                                                              $curStats{'any'}{'pos'}{'val'}{$datatype}{papers}{$joinkey}++; }
%svmLowFP                          SVM positive low validated false positive
+
    $curStats{'afp'}{'pos'}{'val'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosVal{$datatype} };
%svmLowTP                          SVM positive low validated true positive
+
    $curStats{'afp'}{'pos'}{'val'}{$datatype}{'ratio'}                  = $ratio;
%svmLowTpCur                            SVM positive low validated true positive curated
+
 
%svmLowTpNC                            SVM positive low validated true positive not curated
+
    my $countAfpPosTP  = scalar keys %{ $afpPosTP{$datatype} };
%svmLowNV                          SVM positive low not validated
+
    $ratio = 0;
%svmLowNC                          SVM positive low not curated
+
    if ($countAfpPosVal > 0) { $ratio = $countAfpPosTP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }
%svmNeg                     SVM negative
+
    foreach my $joinkey (keys %{ $afpPosTP{$datatype} }) {    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++;
%svmNegVal                      SVM negative validated
+
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{$datatype}{papers}{$joinkey}++; }
%svmNegTN                          SVM negative validated true negative
+
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'countPap'}        = scalar keys %{ $afpPosTP{$datatype} };
%svmNegFN                          SVM negative validated false negative
+
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{$datatype}{'ratio'}            = $ratio;
%svmNegFnCur                            SVM negative validated false negative curated
 
%svmNegFnNC                            SVM negative validated false negative not curated
 
%svmNegNV                          SVM negative not validated
 
%svmNegNC                          SVM negative not curated
 
</pre>
 
  
 +
    my $countAfpPosTpCur  = scalar keys %{ $afpPosTpCur{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpCur / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpPosTpCur{$datatype} }) { $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++;
 +
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'countPap'}  = scalar keys %{ $afpPosTpCur{$datatype} };
 +
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'cur'}{$datatype}{'ratio'}    = $ratio;
  
'''Author First Pass paper stats'''
+
    my $countAfpPosTpNC  = scalar keys %{ $afpPosTpNC{$datatype} };
 
+
    $ratio = 0;
<pre>
+
    if ($countAfpPosTP > 0) { $ratio = $countAfpPosTpNC / $countAfpPosTP * 100; $ratio = FormatSigFigs($ratio, 2); }
%afpEmailed                        AFP emailed
+
    foreach my $joinkey (keys %{ $afpPosTpNC{$datatype} }) {  $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++;
%afpFlagged                        AFP flagged
+
                                                              $curStats{'any'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
%afpPos                            AFP positive
+
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpPosTpNC{$datatype} };
%afpPosVal                            AFP positive validated
+
    $curStats{'afp'}{'pos'}{'val'}{'tp'}{'ncur'}{$datatype}{'ratio'}    = $ratio;
%afpPosFP                                  AFP positive validated false positive
 
%afpPosTP                                  AFP positive validated true positive
 
%afpPosTpCur                                  AFP positive validated true positive curated
 
%afpPosTpNC                                   AFP positive validated true positive not curated
 
%afpPosNV                              AFP positive not validated
 
%afpPosNC                              AFP positive not curated
 
%afpNeg                            AFP negative
 
%afpNegVal                            AFP negative validated
 
%afpNegTN                                  AFP negative validated true negative
 
%afpNegFN                                  AFP negative validated false negative
 
%afpNegFnCur                                  AFP negative validated false negative curated
 
%afpNegFnNC                                    AFP negative validated false negative not curated
 
%afpNegNV                              AFP negative not validated
 
%afpNegNC                              AFP negative not curated
 
</pre>
 
  
 +
    my $countAfpPosFP  = scalar keys %{ $afpPosFP{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpPosVal > 0) { $ratio = $countAfpPosFP / $countAfpPosVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpPosFP{$datatype} }) {    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++;
 +
                                                              $curStats{'any'}{'pos'}{'val'}{'fp'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'countPap'}        = scalar keys %{ $afpPosFP{$datatype} };
 +
    $curStats{'afp'}{'pos'}{'val'}{'fp'}{$datatype}{'ratio'}            = $ratio;
  
'''Curator First Pass paper stats'''
+
    my $countAfpPosNV  = scalar keys %{ $afpPosNV{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpPos > 0) { $ratio = $countAfpPosNV / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpPosNV{$datatype} }) {    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++;
 +
                                                              $curStats{'any'}{'pos'}{'nval'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosNV{$datatype} };
 +
    $curStats{'afp'}{'pos'}{'nval'}{$datatype}{'ratio'}                = $ratio;
 +
 
 +
    my $countAfpPosNC  = scalar keys %{ $afpPosNC{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpPos > 0) { $ratio = $countAfpPosNC / $countAfpPos * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpPosNC{$datatype} }) {    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++;
 +
                                                              $curStats{'any'}{'pos'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'countPap'}              = scalar keys %{ $afpPosNC{$datatype} };
 +
    $curStats{'afp'}{'pos'}{'ncur'}{$datatype}{'ratio'}                = $ratio;
 +
  } # foreach my $datatype (@datatypesToShow)
 +
} # sub getCurationStatisticsAfpPos
 +
</pre>
 +
 
 +
 
 +
For AFP Negatives, the following logic is performed to determine the indicated values (list of papers), per datatype:
  
 
<pre>
 
<pre>
%cfpFlagged                        CFP flagged
+
AFP negative (%afpNeg)
%cfpPos                            CFP positive
+
AFP negative validated (%afpNegVal)                             : %afpNeg AND (%valNeg OR %valPos)
%cfpPosVal                             CFP positive validated
+
AFP negative validated true negative (%afpNegTN)                : %afpNeg AND %valNeg
%cfpPosFP                                  CFP positive validated false positive
+
AFP negative validated false negative (%afpNegFN)              : %afpNeg AND %valPos
%cfpPosTP                                  CFP positive validated true positive
+
AFP negative validated false negative curated (%afpNegFnCur)    : %afpNeg AND %valPos AND %valCur    <Note: the %valPOS is redundant>
%cfpPosTpCur                                  CFP positive validated true positive curated
+
AFP negative validated false negative not curated (%afpNegFnNC) : %afpNeg AND %valPos NOT %valCur
%cfpPosTpNC                                    CFP positive validated true positive not curated
+
AFP negative not validated (%afpNegNV)                          : %afpNeg NOT (%valNeg OR %valPos)
%cfpPosNV                              CFP positive not validated
+
AFP negative not curated (%afpNegNC)                            : (%afpNeg AND (%valPos NOT %valCur)) OR (%afpNeg NOT (%valPos OR %valNeg))
%cfpPosNC                              CFP positive not curated
 
%cfpNeg                            CFP negative
 
%cfpNegVal                            CFP negative validated
 
%cfpNegTN                                  CFP negative validated true negative
 
%cfpNegFN                                  CFP negative validated false negative
 
%cfpNegFnCur                                  CFP negative validated false negative curated
 
%cfpNegFnNC                                    CFP negative validated false negative not curated
 
%cfpNegNV                              CFP negative not validated
 
%cfpNegNC                              CFP negative not curated
 
 
</pre>
 
</pre>
  
 
+
which are determined by the following section of code:
 
 
== Postgres Table Structures ==
 
 
 
 
 
=== cur_curdata table ===
 
 
 
 
 
The cur_curdata table in Postgres has the following structure:
 
 
 
  
 
<pre>
 
<pre>
cur_paper      | text                    |
+
sub getCurationStatisticsAfpNeg {
cur_datatype   | text                    |
+
  my ($datatypesToShow_ref) = @_;
cur_curator   | text                    |
+
  my @datatypesToShow = @$datatypesToShow_ref;
cur_curdata   | text                    |
+
  my %afpNegNV; my %afpNegVal; my %afpNegTN; my %afpNegFN; my %afpNegFnCur; my %afpNegFnNC; my %afpNegNC;
  cur_selcomment | text                    |
+
        # negative and : not validated, validated, true negative, false negative, FN curated, FN not curated, not curated minus validated negative OR not validated + FN not curated
  cur_txtcomment | text                    |
+
   foreach my $datatype (@datatypesToShow) {
cur_timestamp  | timestamp with time zone |
+
    foreach my $joinkey (sort keys %{ $afpNeg{$datatype} }) {
</pre>
+
      if ($valNeg{$datatype}{$joinkey}) {      $afpNegTN{$datatype}{$joinkey}++;   $afpNegVal{$datatype}{$joinkey}++; }
 +
        elsif ($valPos{$datatype}{$joinkey}) { $afpNegFN{$datatype}{$joinkey}++;   $afpNegVal{$datatype}{$joinkey}++;
 +
          if ($valCur{$datatype}{$joinkey}) { $afpNegFnCur{$datatype}{$joinkey}++; }
 +
            else                            {  $afpNegFnNC{$datatype}{$joinkey}++; $afpNegNC{$datatype}{$joinkey}++; } }
 +
        else {                                $afpNegNV{$datatype}{$joinkey}++;    $afpNegNC{$datatype}{$joinkey}++; } } }
 +
  tie %{ $curStats{'afp'}{'neg'} }, "Tie::IxHash";
 +
  foreach my $datatype (@datatypesToShow) {
 +
    my $countAfpFlagged  = scalar keys %{ $afpFlagged{$datatype} };
 +
    my $countAfpNeg = scalar keys %{ $afpNeg{$datatype} };
 +
    my $ratio = 0;
 +
    if ($countAfpFlagged > 0) { $ratio = $countAfpNeg / $countAfpFlagged * 100; $ratio = FormatSigFigs($ratio, 2); }    # ($ratio) = &roundToPlaces($ratio, 2); # $ratio = sprintf "%.2f", $ratio;
 +
    foreach my $joinkey (keys %{ $afpNeg{$datatype} }) { $curStats{'afp'}{'neg'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{$datatype}{'countPap'} = scalar keys %{ $afpNeg{$datatype} };
 +
    $curStats{'afp'}{'neg'}{$datatype}{'ratio'}    = $ratio;
  
 +
    my $countAfpNegVal  = scalar keys %{ $afpNegVal{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpNeg > 0) { $ratio = $countAfpNegVal / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegVal{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'val'}{$datatype}{'countPap'} = scalar keys %{ $afpNegVal{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'val'}{$datatype}{'ratio'}    = $ratio;
  
The following is an example of a Postgres query to return cur_curdata for the paper WBPaper00031688:
+
    my $countAfpNegTN  = scalar keys %{ $afpNegTN{$datatype} };
 
+
    $ratio = 0;
<pre>
+
    if ($countAfpNegVal > 0) { $ratio = $countAfpNegTN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }
testdb=> SELECT * FROM cur_curdata WHERE cur_paper = '00031688';
+
    foreach my $joinkey (keys %{ $afpNegTN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{papers}{$joinkey}++; }
cur_paper | cur_datatype | cur_curator | cur_curdata  | cur_selcomment |      cur_txtcomment      |        cur_timestamp       
+
     $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegTN{$datatype} };
-----------+--------------+-------------+--------------+----------------+---------------------------+-------------------------------
+
     $curStats{'afp'}{'neg'}{'val'}{'tn'}{$datatype}{'ratio'}    = $ratio;
00031688  | otherexpr    | two1823    | positive    |                | qwer                      | 2012-11-20 15:48:18.113978-08
 
00031688  | geneint      | two2987     | notvalidated | 2              | test test test            | 2012-11-26 17:06:30.550952-08
 
00031688  | rnai        | two2987     | notvalidated |                | testing for documentation | 2012-12-13 14:25:13.360082-08
 
(3 rows)
 
</pre>
 
  
 +
    my $countAfpNegFN  = scalar keys %{ $afpNegFN{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpNegVal > 0) { $ratio = $countAfpNegFN / $countAfpNegVal * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegFN{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFN{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{$datatype}{'ratio'}    = $ratio;
  
For a given paper-datatype pair:
+
    my $countAfpNegFnCur  = scalar keys %{ $afpNegFnCur{$datatype} };
 
+
    $ratio = 0;
'''cur_paper''': Stores the ID # for the WBPaper ID
+
    if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnCur / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegFnCur{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnCur{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'cur'}{$datatype}{'ratio'}    = $ratio;
  
'''cur_datatype''': Stores the datatype
+
    my $countAfpNegFnNC  = scalar keys %{ $afpNegFnNC{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpNegFN > 0) { $ratio = $countAfpNegFnNC / $countAfpNegFN * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegFnNC{$datatype} }) { $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegFnNC{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'val'}{'fn'}{'ncur'}{$datatype}{'ratio'}    = $ratio;
  
'''cur_curator''': Stores the curator ID in the format: "two" + "ID # from <WBPersonID>"
+
    my $countAfpNegNV  = scalar keys %{ $afpNegNV{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpNeg > 0) { $ratio = $countAfpNegNV / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegNV{$datatype} }) { $curStats{'afp'}{'neg'}{'nval'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'nval'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNV{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'nval'}{$datatype}{'ratio'}    = $ratio;
  
'''cur_curdata''': Stores validation status: "positive", "negative", "curated", "notvalidated"
+
    my $countAfpNegNC  = scalar keys %{ $afpNegNC{$datatype} };
 +
    $ratio = 0;
 +
    if ($countAfpNeg > 0) { $ratio = $countAfpNegNC / $countAfpNeg * 100; $ratio = FormatSigFigs($ratio, 2); }
 +
    foreach my $joinkey (keys %{ $afpNegNC{$datatype} }) { $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{papers}{$joinkey}++; }
 +
    $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'countPap'} = scalar keys %{ $afpNegNC{$datatype} };
 +
    $curStats{'afp'}{'neg'}{'ncur'}{$datatype}{'ratio'}    = $ratio;
 +
  } # foreach my $datatype (@datatypesToShow)
 +
} # sub getCurationStatisticsAfpNeg
  
'''cur_selcomment''': Stores the premade comment key (only one value possible by the form; Postgres is OK with more values)
+
</pre>
  
'''cur_txtcomment''': Stores the free-text comment
 
  
'''cur_timestamp''': Stores the timestamp for the data submission through the Curation Statistics Form
+
'''"Any" and "Intersection" rows of the Curation Statistics table'''
 +
 
 +
To determine the "Any" and "Intersection" results, all flagging methods currently visible in the Curation Statistics table are considered. So, for the main Curation Statistics table (with no options selected), all flagging methods (SVM, AFP, and CFP as of 12-10-2012) are considered. The calculations in this case would be:
  
<br>
+
<pre>
<br>
+
Any flagged                                                    : %svmData      OR %strFlagged  OR %afpFlagged  OR %cfpFlagged
 +
Any positive                                                    : %svmPos      OR %strPos      OR %afpPos      OR %cfpPos
 +
Any positive validated                                          : %svmPosVal    OR %strPosVal    OR %afpPosVal    OR %cfpPosVal
 +
Any positive validated false positive                          : %svmPosFP    OR %strPosFP    OR %afpPosFP    OR %cfpPosFP
 +
Any positive validated true positive                            : %svmPosTP    OR %strPosTP    OR %afpPosTP    OR %cfpPosTP
 +
Any positive validated true positive curated                    : %svmPosTpCur  OR %strPosTpCur  OR %afpPosTpCur  OR %cfpPosTpCur
 +
Any positive validated true positive not curated                : %svmPosTpNC  OR %strPosTpNC  OR %afpPosTpNC  OR %cfpPosTpNC
 +
Any positive not validated                                      : %svmPosNV    OR %strPosNV    OR %afpPosNV    OR %cfpPosNV
 +
Any positive not curated                                        : %svmPosNC    OR %strPosNC    OR %afpPosNC    OR %cfpPosNC
 +
                                                                                   
 +
Intersection flagged                                            : %svmData    AND %strFlagged  AND %afpFlagged  AND %cfpFlagged
 +
Intersection positive                                          : %svmPos      AND %strPos      AND %afpPos      AND %cfpPos
 +
Intersection positive validated                                : %svmPosVal  AND %strPosVal  AND %afpPosVal  AND %cfpPosVal
 +
Intersection positive validated false positive                  : %svmPosFP    AND %strPosFP    AND %afpPosFP    AND %cfpPosFP
 +
Intersection positive validated true positive                  : %svmPosTP    AND %strPosTP    AND %afpPosTP    AND %cfpPosTP
 +
Intersection positive validated true positive curated          : %svmPosTpCur AND %strPosTpCur AND %afpPosTpCur AND %cfpPosTpCur
 +
Intersection positive validated true positive not curated      : %svmPosTpNC  AND %strPosTpNC  AND %afpPosTpNC  AND %cfpPosTpNC
 +
Intersection positive not validated                            : %svmPosNV    AND %strPosNV    AND %afpPosNV    AND %cfpPosNV
 +
Intersection positive not curated                              : %svmPosNC    AND %strPosNC    AND %afpPosNC    AND %cfpPosNC
 +
</pre>
  
  
=== cur_svmdata table ===
+
Note that if a curator enters the Curation Statistics table after entering deselecting any of the flagging methods in the [[New_2012_Curation_Status#Curation_Statistics_Options_Page|Curation Statistics Options Page]], the "Any" and "Intersection" sections of the table will only reflect the flagging methods chosen by the curator. Thus, if a curator chooses to view only one flagging method, the "Any", "Intersection", and "Flagged Positive" sections of the table will show identical results.
  
  
The cur_svmdata table in Postgres has the following structure:
+
The following are the correspondences between rows in the Curation Statistics table and the hash tables in the form's code:
  
 +
'''General paper stats'''
  
 
<pre>
 
<pre>
cur_paper    | text                    |
+
%curatablePapers                curatable papers
cur_datatype  | text                    |
+
%objsCurated                    objects curated
cur_date      | text                    |
+
%objsCurated/%valCur            objects curated per paper
cur_svmdata  | text                     |
+
%valCur                        Papers curated
cur_version  | text                    |
+
%validated                      Papers validated
cur_timestamp | timestamp with time zone
+
%valPos                            Papers validated positive
 +
%valCur                                Papers validated positive curated
 +
%valPos NOT %valCur                     Papers validated positive not curated
 +
%valNeg                            Papers validated negative
 +
%conflict                          Papers validated conflict
 
</pre>
 
</pre>
  
  
The following is an example of a Postgres query to return cur_svmdata for the paper WBPaper00031688:
+
'''Support Vector Machine paper stats'''
 
 
  
 
<pre>
 
<pre>
testdb=> SELECT * FROM cur_svmdata WHERE cur_paper = '00031688';
+
%noSvm                  SVM no svm processed
cur_paper | cur_datatype | cur_date | cur_svmdata | cur_version |        cur_timestamp       
+
%svmData                SVM has svm
-----------+--------------+----------+-------------+-------------+-------------------------------
+
%svmPos                    SVM positive any
00031688  | structcorr  | 20090101 | high       | 0          | 2012-12-02 23:22:07.607586-08
+
%svmPosVal                      SVM positive any validated
00031688  | seqchange    | 20090101 | high       | 0          | 2012-12-02 23:22:07.599254-08
+
%svmPosFP                          SVM positive any validated false positive
00031688  | rnai        | 20090101 | high       | 0          | 2012-12-02 23:22:07.590835-08
+
%svmPosTP                          SVM positive any validated true positive
00031688  | overexpr    | 20090101 | high       | 0          | 2012-12-02 23:22:07.582514-08
+
%svmPosTpCur                            SVM positive any validated true positive curated
00031688  | otherexpr    | 20090101 | high       | 0          | 2012-12-02 23:22:07.574163-08
+
%svmPosTpNC                            SVM positive any validated true positive not curated
00031688  | newmutant    | 20090101 | high       | 0          | 2012-12-02 23:22:07.565834-08
+
%svmPosNV                          SVM positive any not validated
00031688  | genereg      | 20090101 | high       | 0          | 2012-12-02 23:22:07.557501-08
+
%svmPosNC                          SVM positive any not curated
00031688  | geneprod    | 20090101 | NEG        | 0          | 2012-12-02 23:22:07.549171-08
+
%svmHig                     SVM positive high
00031688  | geneint      | 20090101 | high        | 0          | 2012-12-02 23:22:07.540835-08
+
%svmHigVal                      SVM positive high validated
00031688  | antibody    | 20090101 | NEG        | 0          | 2012-12-02 23:22:07.532509-08
+
%svmHigFP                          SVM positive high validated false positive
(10 rows)
+
%svmHigTP                          SVM positive high validated true positive
 +
%svmHigTpCur                            SVM positive high validated true positive curated
 +
%svmHigTpNC                            SVM positive high validated true positive not curated
 +
%svmHigNV                          SVM positive high not validated
 +
%svmHigNC                          SVM positive high not curated
 +
%svmMed                     SVM positive medium
 +
%svmMedVal                      SVM positive medium validated
 +
%svmMedFP                          SVM positive medium validated false positive
 +
%svmMedTP                          SVM positive medium validated true positive
 +
%svmMedTpCur                            SVM positive medium validated true positive curated
 +
%svmMedTpNC                            SVM positive medium validated true positive not curated
 +
%svmMedNV                          SVM positive medium not validated
 +
%svmMedNC                          SVM positive medium not curated
 +
%svmLow                     SVM positive low
 +
%svmLowVal                      SVM positive low validated
 +
%svmLowFP                          SVM positive low validated false positive
 +
%svmLowTP                          SVM positive low validated true positive
 +
%svmLowTpCur                            SVM positive low validated true positive curated
 +
%svmLowTpNC                            SVM positive low validated true positive not curated
 +
%svmLowNV                          SVM positive low not validated
 +
%svmLowNC                          SVM positive low not curated
 +
%svmNeg                     SVM negative
 +
%svmNegVal                      SVM negative validated
 +
%svmNegTN                          SVM negative validated true negative
 +
%svmNegFN                          SVM negative validated false negative
 +
%svmNegFnCur                            SVM negative validated false negative curated
 +
%svmNegFnNC                            SVM negative validated false negative not curated
 +
%svmNegNV                          SVM negative not validated
 +
%svmNegNC                          SVM negative not curated
 
</pre>
 
</pre>
  
 +
'''String Matches paper stats'''
  
For a given paper-datatype pair:
+
<pre>
 +
%strFlagged                        STR flagged
 +
%strPos                            STR positive
 +
%strPosVal                            STR positive validated
 +
%strPosFP                                  STR positive validated false positive
 +
%strPosTP                                  STR positive validated true positive
 +
%strPosTpCur                                  STR positive validated true positive curated
 +
%strPosTpNC                                    STR positive validated true positive not curated
 +
%strPosNV                              STR positive not validated
 +
%strPosNC                              STR positive not curated
 +
%strNeg                            STR negative
 +
%strNegVal                            STR negative validated
 +
%strNegTN                                  STR negative validated true negative
 +
%strNegFN                                  STR negative validated false negative
 +
%strNegFnCur                                  STR negative validated false negative curated
 +
%strNegFnNC                                    STR negative validated false negative not curated
 +
%strNegNV                              STR negative not validated
 +
%strNegNC                              STR negative not curated
 +
</pre>
  
'''cur_paper''': Stores the ID # for the WBPaper ID
+
'''Author First Pass paper stats'''
  
'''cur_datatype''': Stores the datatype
+
<pre>
 +
%afpEmailed                        AFP emailed
 +
%afpFlagged                        AFP flagged
 +
%afpPos                            AFP positive
 +
%afpPosVal                            AFP positive validated
 +
%afpPosFP                                  AFP positive validated false positive
 +
%afpPosTP                                  AFP positive validated true positive
 +
%afpPosTpCur                                  AFP positive validated true positive curated
 +
%afpPosTpNC                                    AFP positive validated true positive not curated
 +
%afpPosNV                              AFP positive not validated
 +
%afpPosNC                              AFP positive not curated
 +
%afpNeg                            AFP negative
 +
%afpNegVal                            AFP negative validated
 +
%afpNegTN                                  AFP negative validated true negative
 +
%afpNegFN                                  AFP negative validated false negative
 +
%afpNegFnCur                                  AFP negative validated false negative curated
 +
%afpNegFnNC                                    AFP negative validated false negative not curated
 +
%afpNegNV                              AFP negative not validated
 +
%afpNegNC                              AFP negative not curated
 +
</pre>
  
'''cur_date''': Stores the date of the SVM directory on Yuling's computer
 
  
'''cur_svmdata''': Stores the SVM result: "high", "medium", "low", "NEG"
+
'''Curator First Pass paper stats'''
  
'''cur_version''': Stores the SVM version ('''Notify Juancarlos when a new SVM version is used''')
+
<pre>
 +
%cfpFlagged                        CFP flagged
 +
%cfpPos                            CFP positive
 +
%cfpPosVal                            CFP positive validated
 +
%cfpPosFP                                  CFP positive validated false positive
 +
%cfpPosTP                                  CFP positive validated true positive
 +
%cfpPosTpCur                                  CFP positive validated true positive curated
 +
%cfpPosTpNC                                    CFP positive validated true positive not curated
 +
%cfpPosNV                              CFP positive not validated
 +
%cfpPosNC                              CFP positive not curated
 +
%cfpNeg                            CFP negative
 +
%cfpNegVal                            CFP negative validated
 +
%cfpNegTN                                  CFP negative validated true negative
 +
%cfpNegFN                                  CFP negative validated false negative
 +
%cfpNegFnCur                                  CFP negative validated false negative curated
 +
%cfpNegFnNC                                    CFP negative validated false negative not curated
 +
%cfpNegNV                              CFP negative not validated
 +
%cfpNegNC                              CFP negative not curated
 +
</pre>
  
'''cur_timestamp''': Stores the timestamp for when the data was read (by a cronjob) into this cur_svmdata table
+
== Postgres Table Structures ==
  
  
<br>
+
=== cur_curdata table ===
<br>
 
  
  
 +
The cur_curdata table in Postgres has the following structure:
  
== Script for Populating SVM Data into Postgres (cur_svmdata) ==
 
  
The script is located on Tazendra here:
+
<pre>
 
+
cur_paper      | text                    |
/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/populate_svm_result.pl
+
cur_datatype  | text                    |
 +
cur_curator    | text                    |
 +
cur_curdata    | text                    |
 +
cur_selcomment | text                    |
 +
cur_txtcomment | text                    |
 +
cur_timestamp  | timestamp with time zone |
 +
</pre>
  
  
=== Allowable Datatypes ===
+
The following is an example of a Postgres query to return cur_curdata for the paper WBPaper00031688:
  
Currently (12-13-2012), the only allowable datatypes for SVM data are:
+
<pre>
 +
testdb=> SELECT * FROM cur_curdata WHERE cur_paper = '00031688';
 +
cur_paper | cur_datatype | cur_curator | cur_curdata  | cur_selcomment |      cur_txtcomment      |        cur_timestamp       
 +
-----------+--------------+-------------+--------------+----------------+---------------------------+-------------------------------
 +
00031688  | otherexpr    | two1823    | positive    |                | qwer                      | 2012-11-20 15:48:18.113978-08
 +
00031688  | geneint      | two2987    | notvalidated | 2              | test test test            | 2012-11-26 17:06:30.550952-08
 +
00031688  | rnai        | two2987    | notvalidated |                | testing for documentation | 2012-12-13 14:25:13.360082-08
 +
(3 rows)
 +
</pre>
  
*antibody
 
*geneint
 
*geneprod_GO
 
*genereg
 
*newmutant
 
*otherexpr
 
*overexpr
 
*rnai
 
*seqchange
 
*structcorr
 
  
=== Populating Papers for which there is only a main paper (no supplements) ===
+
For a given paper-datatype pair:
 +
 
 +
'''cur_paper''': Stores the ID # for the WBPaper ID
 +
 
 +
'''cur_datatype''': Stores the datatype
 +
 
 +
'''cur_curator''': Stores the curator ID in the format: "two" + "ID # from <WBPersonID>"
  
 +
'''cur_curdata''': Stores validation status: "positive", "negative", "curated", "notvalidated"
  
Yuling gave us a file of main-only paper IDs and this is stored in the ''%mainOnly'' hash.
+
'''cur_selcomment''': Stores the premade comment key (only one value possible by the form; Postgres is OK with more values)
  
 +
'''cur_txtcomment''': Stores the free-text comment
  
<pre>
+
'''cur_timestamp''': Stores the timestamp for the data submission through the Curation Statistics Form
my %mainOnly;
 
my $mainOnly_file = '/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/main_only';
 
open (IN, "<$mainOnly_file") or die "Cannot open $mainOnly_file : $!";
 
while (my $paper = <IN>) { chomp $paper; $paper =~ s/^WBPaper//; $mainOnly{$paper}++; }
 
close (IN) or die "Cannot close $mainOnly_file : $!";
 
</pre>
 
  
 +
<br>
 +
<br>
  
  
=== Populating SVM dates already in Postgres to avoid re-processing/duplication ===
+
=== cur_svmdata table ===
  
  
The following code checks Postgres for SVM dates to make sure that duplicate SVM data is not loaded into the cur_svmdata table. The ''%datesDone'' hash has the directory name/date of when a batch of SVM was processed. The ''%pg'' hash holds the paper ID, datatype, and date mapping to the SVM result.
+
The cur_svmdata table in Postgres has the following structure:
  
  
 
<pre>
 
<pre>
sub populateFromPg {
+
cur_paper    | text                    |
  $result = $dbh->prepare( "SELECT * FROM cur_svmdata" );
+
cur_datatype  | text                    |
   $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
+
cur_date      | text                    |
   while (my @row = $result->fetchrow) {
+
cur_svmdata  | text                    |
    my ( $joinkey, $type, $date, $flag, $version, $timestamp ) = @row;
+
cur_version   | text                    |
    $datesDone{$date}++;
+
cur_timestamp | timestamp with time zone
    $pg{"$joinkey\t$type\t$date"} = $flag;
 
  }
 
} # sub populateFromPg
 
 
</pre>
 
</pre>
  
  
 +
The following is an example of a Postgres query to return cur_svmdata for the paper WBPaper00031688:
  
=== Reading from Yuling's computer ===
 
  
 +
<pre>
 +
testdb=> SELECT * FROM cur_svmdata WHERE cur_paper = '00031688';
 +
cur_paper | cur_datatype | cur_date | cur_svmdata | cur_version |        cur_timestamp       
 +
-----------+--------------+----------+-------------+-------------+-------------------------------
 +
00031688  | structcorr  | 20090101 | high        | 0          | 2012-12-02 23:22:07.607586-08
 +
00031688  | seqchange    | 20090101 | high        | 0          | 2012-12-02 23:22:07.599254-08
 +
00031688  | rnai        | 20090101 | high        | 0          | 2012-12-02 23:22:07.590835-08
 +
00031688  | overexpr    | 20090101 | high        | 0          | 2012-12-02 23:22:07.582514-08
 +
00031688  | otherexpr    | 20090101 | high        | 0          | 2012-12-02 23:22:07.574163-08
 +
00031688  | newmutant    | 20090101 | high        | 0          | 2012-12-02 23:22:07.565834-08
 +
00031688  | genereg      | 20090101 | high        | 0          | 2012-12-02 23:22:07.557501-08
 +
00031688  | geneprod    | 20090101 | NEG        | 0          | 2012-12-02 23:22:07.549171-08
 +
00031688  | geneint      | 20090101 | high        | 0          | 2012-12-02 23:22:07.540835-08
 +
00031688  | antibody    | 20090101 | NEG        | 0          | 2012-12-02 23:22:07.532509-08
 +
(10 rows)
 +
</pre>
  
The SVM results are read from the following URL:
 
  
http://131.215.52.209/celegans/svm_results/
+
For a given paper-datatype pair:
 +
 
 +
'''cur_paper''': Stores the ID # for the WBPaper ID
 +
 
 +
'''cur_datatype''': Stores the datatype
 +
 
 +
'''cur_date''': Stores the date of the SVM directory on Yuling's computer
 +
 
 +
'''cur_svmdata''': Stores the SVM result: "high", "medium", "low", "NEG"
 +
 
 +
'''cur_version''': Stores the SVM version ('''Notify Juancarlos when a new SVM version is used''')
  
 +
'''cur_timestamp''': Stores the timestamp for when the data was read (by a cronjob) into this cur_svmdata table
  
The code (below) looks for 'href' links of only digits (numbers) followed by a slash ('/'), ignoring any links that contain non-digits.
 
  
 +
=== cur_strdata table ===
  
<pre>
 
  my (@dates) = $root_page =~ m/<a href=\"(\d+)\/\">/g;
 
</pre>
 
  
For each of those links, the code will skip any that are already present in the ''%datesDone'' hash, as they are already in Postgres. Hence, any new results stored in older directories will be missed.
+
The cur_strdata table in Postgres has the following structure:
  
The code then looks for 'href' links to files that have svm results for a given datatype.  Matches on links that are made solely of 'word' characters (digits, letters, and underscores)
 
  
 
<pre>
 
<pre>
     my (@date_types) = $date_page =~ m/<a href=\"(\w+)\"/g;
+
cur_paper     | text                    |
 +
cur_datatype  | text                    |
 +
cur_date      | text                    |
 +
cur_svmdata  | text                    |
 +
cur_version  | text                    |
 +
cur_timestamp | timestamp with time zone
 
</pre>
 
</pre>
  
For each of those files, the code maps to the datatype.  The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_' followed by the datatype.  If this structure is not followed, the code will not find this file.  Skip if there's no datatype.
+
 
 +
The following is an example of a Postgres query to return cur_strdata for the paper WBPaper00024206:
 +
 
  
 
<pre>
 
<pre>
      my ($type) = $date_type =~ m/^[\d_]+_(?:and_missedPaper_)?(\w+)$/;
+
testdb=> SELECT * FROM cur_strdata WHERE cur_paper = '00024206';
 +
cur_paper | cur_datatype | cur_date |    cur_strdata      | cur_version |        cur_timestamp       
 +
-----------+--------------+----------+----------------------+-------------+-------------------------------
 +
00024206  | humandisease |          | string              |            | 2014-11-06 08:47:56.430251-08
 +
00024206  | antibody    |          | preparation Antibody |            | 2014-11-06 08:47:56.451557-08
 +
(2 rows)
 +
 
 
</pre>
 
</pre>
  
'geneprod' results are called 'geneprod_GO' in svm files, map them to 'geneprod'.
 
If a datatype is not in the allowable datatypes list, skip it and add to error output.
 
  
Get the file of results, separate into lines, for each line
+
For a given paper-datatype pair:
* get rid of doublequotes
+
 
* tab-separate paper from result
+
'''cur_paper''': Stores the ID # for the WBPaper ID
* skip if paper does not begin with "WBPaper<some non-space>"
+
 
* separate the paper into the number corresponding to the paper ID and the modifier (e.g. 'concat' or 'sup.1'), if there is no modifier, the modifier is 'main'
+
'''cur_datatype''': Stores the datatype
* if the paper ID was in the ''%mainOnly'' hash, call the modifier 'mainonly'.
+
 
* skip unless the modifier is either 'concat' or 'mainonly'.
+
'''cur_date''': placeholder in case we have dates later
* skip if the paper-datatype-date is already in postgres (redundant with skipping by dates above)
+
 
* put result into the ''%hash'' hash.
+
'''cur_strdata''': Stores the STR result: score for sqf, "string" for humandisease, actual matches for antibody.
 +
 
 +
'''cur_version''': placeholder in case we have versions later
 +
 
 +
'''cur_timestamp''': Stores the timestamp for when the data was read (by a cronjob) into this cur_strdata table
 +
 
 +
<br>
 +
<br>
 +
 
 +
== Script for Populating STR Data into Postgres (cur_strdata) ==
 +
 
 +
The script is located on Tazendra here:
 +
 
 +
/home/postgres/work/pgpopulation/cur_curation/cur_strdata/populate_str_result.pl
 +
 
 +
 
 +
the script runs everyday at 4am, it processes 3 diffenrent datatypes, and always replaces all values in cur_strdata.
  
 +
Antibody
 +
this is the URL where it gets the data from
 +
http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen
 +
Antibody splits all values on comma-space and aggregates among the main paper and supplements.
 
<pre>
 
<pre>
      my $date_type_url = $date_url . $date_type;
+
sub processAntibody {
      my $date_type_results_page = get $date_type_url;
+
  my %papers;
      my (@results) = split/\n/, $date_type_results_page;
+
  my $url = 'http://textpresso-dev.caltech.edu/azurebrd/wen/anti_protein_wen';
      foreach my $result (@results) {
+
  my $page = get $url;
        if ($result =~ m/\"/) { $result =~ s/\"//g; }
+
  my (@lines) = split/\n/, $page;
        my ($paper, $flag) = split/\t/, $result;
+
  foreach my $line (@lines) {
        next unless ($paper =~ m/^WBPaper[\S]+/);
+
    my ($where, $values) = split/\t/, $line;
        my ($joinkey, $modifier) = &getPaperAndModifier($paper);
+
    my ($paper) = $where =~ m/WBPaper(\d+)/;
        if ($mainOnly{$joinkey}) { $modifier = 'mainonly'; }
+
    my (@values) = split/, /, $values;                 # aggregate comma-separated values
        next unless ($modifierWholePaper{$modifier});
+
    foreach my $value (@values) { $papers{$paper}{$value}++; }
        my ($tabKey) = &makeTabKey( $joinkey, $modifier, $type, $date );
+
  } # foreach my $line (@lines)
        next if ($pg{$tabKey});                 # skip if already in postgres
+
  my @groupvalues;
        $hash{$tabKey} = $flag;
+
  foreach my $paper (sort keys %papers) {
      }
+
    my $values = join", ", sort keys %{ $papers{$paper} };
 +
    push @groupvalues, qq(('$paper', 'antibody', NULL, '$values', NULL));
 +
  } # foreach my $paper (sort keys %papers)
 +
  my $groupvalues = join", ", @groupvalues;
 +
  push @pgcommands, qq(INSERT INTO cur_strdata VALUES $groupvalues;);
 +
} # sub processAntibody
 
</pre>
 
</pre>
  
For specific dates/ directories, look at the corresponding <dates>/checkFalseNegatives/ directory for files with SVM results that are negative (NEG)
+
Sequence feature
 
+
this is the URL where it gets the data from
The code is the same, but the regular expression to match for datatypes is : The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_', followed by 'checkFN_', followed by the datatype.  If this structure is not followed, the code will not find this file.
+
http://textpresso-dev.caltech.edu/sequence_feature/fullcorpus_result
 +
Sequence Feature stores the score.
 +
<pre>
 +
sub processSeqfeature {
 +
  my %papers;
 +
  my $url = 'http://textpresso-dev.caltech.edu/sequence_feature/fullcorpus_result';
 +
  my $page = get $url;
 +
  my (@lines) = split/\n/, $page;
 +
  foreach my $line (@lines) {
 +
    my ($where, $score) = split/ /, $line;
 +
    next unless ($score >= 25);                    # only keep is score >= 25
 +
    my ($paper) = $where =~ m/WBPaper(\d+)/;
 +
    unless ($papers{$paper}) { $papers{$paper} = $score; }
 +
  } # foreach my $line (@lines)
 +
  my @groupvalues;
 +
  foreach my $paper (sort keys %papers) {
 +
    push @groupvalues, qq(('$paper', 'seqfeature', NULL, '$papers{$paper}', NULL));
 +
  } # foreach my $paper (sort keys %papers)
 +
  my $groupvalues = join", ", @groupvalues;
 +
  push @pgcommands, qq(INSERT INTO cur_strdata VALUES $groupvalues;);
 +
} # sub processSeqfeature
 +
</pre>
  
 +
Human disease
 +
this is the URL where it gets the data from
 +
http://textpresso-dev.caltech.edu/disease/all_papers_nonReview
 +
Human Disease aggregates all main papers and supplements and stores the word 'string'
 
<pre>
 
<pre>
      my ($type) = $fn_date_type =~ m/^[\d_]+_(?:and_missedPaper_)?checkFN_(\w+)$/;
+
sub processHumandisease {
 +
  my %papers;
 +
  my $url = 'http://textpresso-dev.caltech.edu/disease/all_papers_nonReview';
 +
  my $page = get $url;
 +
  my (@lines) = split/\n/, $page;
 +
  foreach my $line (@lines) { my ($paper) = $line =~ m/WBPaper(\d+)/; $papers{$paper} = 'string'; }
 +
  my @groupvalues;
 +
  foreach my $paper (sort keys %papers) {
 +
    push @groupvalues, qq(('$paper', 'humandisease', NULL, '$papers{$paper}', NULL));
 +
  }
 +
  my $groupvalues = join", ", @groupvalues;
 +
  push @pgcommands, qq(INSERT INTO cur_strdata VALUES $groupvalues;);
 +
} # sub processHumandisease
 
</pre>
 
</pre>
  
The SVM result for all of these are set to NEG (negative).
 
  
For each of the values that were entered into ''%hash'', the version is '0' for dates before 2012-06-28, and the version is '1' otherwise.
 
  
<pre>
+
String search results are only a set of positive papers.  For this form's purposes, all papers that are curatable are considered flagged, and any of those that are not a positive string search match, are considered a negative string search match.
  my $version = '0';
 
  if ($date < 20120628) { $version = '0'; }
 
    else { $version = '1'; }
 
</pre>
 
  
Enter into postgres cur_svmdata tables the paperID, datatype, svmdate, svm_result, version.
+
== Script for Populating SVM Data into Postgres (cur_svmdata) ==
  
<pre>
+
The script is located on Tazendra here:
  push @pgcommands, qq(INSERT INTO cur_svmdata VALUES('$joinkey', '$type', '$date', '$flag', '$version'));
+
 
</pre>
+
/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/populate_svm_result.pl
 +
 
 +
=== Allowable Datatypes ===
 +
 
 +
Currently (12-13-2012), the only allowable datatypes for SVM data are:
 +
 
 +
*antibody
 +
*geneint
 +
*geneprod_GO
 +
*genereg
 +
*newmutant
 +
*otherexpr
 +
*overexpr
 +
*rnai
 +
*seqchange
 +
*structcorr
 +
 
 +
=== Populating Papers for which there is only a main paper (no supplements) ===
  
Any errors because of invalid datatypes, or SVM results that do not match a WBPaper, nor a paper.something get printed at the end of the output file (change this to email Daniela + Chris).
 
  
(TODO set cronjob to run every day at 4am ?)
+
Yuling gave us a file of main-only paper IDs and this is stored in the ''%mainOnly'' hash.
  
<br>
+
 
<br>
+
<pre>
 +
my %mainOnly;
 +
my $mainOnly_file = '/home/postgres/work/pgpopulation/cur_curation/cur_svmdata/main_only';
 +
open (IN, "<$mainOnly_file") or die "Cannot open $mainOnly_file : $!";
 +
while (my $paper = <IN>) { chomp $paper; $paper =~ s/^WBPaper//; $mainOnly{$paper}++; }
 +
close (IN) or die "Cannot close $mainOnly_file : $!";
 +
</pre>
 +
 
 +
 
 +
 
 +
=== Populating SVM dates already in Postgres to avoid re-processing/duplication ===
 +
 
 +
 
 +
The following code checks Postgres for SVM dates to make sure that duplicate SVM data is not loaded into the cur_svmdata table. The ''%datesDone'' hash has the directory name/date of when a batch of SVM was processed. The ''%pg'' hash holds the paper ID, datatype, and date mapping to the SVM result.
 +
 
 +
 
 +
<pre>
 +
sub populateFromPg {
 +
  $result = $dbh->prepare( "SELECT * FROM cur_svmdata" );
 +
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
 +
  while (my @row = $result->fetchrow) {
 +
    my ( $joinkey, $type, $date, $flag, $version, $timestamp ) = @row;
 +
    $datesDone{$date}++;
 +
    $pg{"$joinkey\t$type\t$date"} = $flag;
 +
  }
 +
} # sub populateFromPg
 +
</pre>
 +
 
 +
 
 +
 
 +
=== Reading from Yuling's computer ===
 +
 
 +
 
 +
The SVM results are read from the following URL:
 +
 
 +
http://131.215.52.209/celegans/svm_results/
 +
 
 +
 
 +
The code (below) looks for 'href' links of only digits (numbers) followed by a slash ('/'), ignoring any links that contain non-digits.
 +
 
 +
 
 +
<pre>
 +
  my (@dates) = $root_page =~ m/<a href=\"(\d+)\/\">/g;
 +
</pre>
 +
 
 +
For each of those links, the code will skip any that are already present in the ''%datesDone'' hash, as they are already in Postgres. Hence, any new results stored in older directories will be missed.
 +
 
 +
The code then looks for 'href' links to files that have svm results for a given datatype.  Matches on links that are made solely of 'word' characters (digits, letters, and underscores)
 +
 
 +
<pre>
 +
    my (@date_types) = $date_page =~ m/<a href=\"(\w+)\"/g;
 +
</pre>
 +
 
 +
For each of those files, the code maps to the datatype.  The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_' followed by the datatype.  If this structure is not followed, the code will not find this file.  Skip if there's no datatype.
 +
 
 +
<pre>
 +
      my ($type) = $date_type =~ m/^[\d_]+_(?:and_missedPaper_)?(\w+)$/;
 +
</pre>
 +
 
 +
'geneprod' results are called 'geneprod_GO' in svm files, map them to 'geneprod'.
 +
If a datatype is not in the allowable datatypes list, skip it and add to error output.
 +
 
 +
Get the file of results, separate into lines, for each line
 +
* get rid of doublequotes
 +
* tab-separate paper from result
 +
* skip if paper does not begin with "WBPaper<some non-space>"
 +
* separate the paper into the number corresponding to the paper ID and the modifier (e.g. 'concat' or 'sup.1'), if there is no modifier, the modifier is 'main'
 +
* if the paper ID was in the ''%mainOnly'' hash, call the modifier 'mainonly'.
 +
* skip unless the modifier is either 'concat' or 'mainonly'.
 +
* skip if the paper-datatype-date is already in postgres (redundant with skipping by dates above)
 +
* put result into the ''%hash'' hash.
 +
 
 +
<pre>
 +
      my $date_type_url = $date_url . $date_type;
 +
      my $date_type_results_page = get $date_type_url;
 +
      my (@results) = split/\n/, $date_type_results_page;
 +
      foreach my $result (@results) {
 +
        if ($result =~ m/\"/) { $result =~ s/\"//g; }
 +
        my ($paper, $flag) = split/\t/, $result;
 +
        next unless ($paper =~ m/^WBPaper[\S]+/);
 +
        my ($joinkey, $modifier) = &getPaperAndModifier($paper);
 +
        if ($mainOnly{$joinkey}) { $modifier = 'mainonly'; }
 +
        next unless ($modifierWholePaper{$modifier});
 +
        my ($tabKey) = &makeTabKey( $joinkey, $modifier, $type, $date );
 +
        next if ($pg{$tabKey});                # skip if already in postgres
 +
        $hash{$tabKey} = $flag;
 +
      }
 +
</pre>
 +
 
 +
For specific dates/ directories, look at the corresponding <dates>/checkFalseNegatives/ directory for files with SVM results that are negative (NEG). 
 +
 
 +
The code is the same, but the regular expression to match for datatypes is : The SVM file structure begins with digits and underscores, followed by an underscore, followed by an optional 'and_missedPaper_', followed by 'checkFN_', followed by the datatype.  If this structure is not followed, the code will not find this file.
 +
 
 +
<pre>
 +
      my ($type) = $fn_date_type =~ m/^[\d_]+_(?:and_missedPaper_)?checkFN_(\w+)$/;
 +
</pre>
 +
 
 +
The SVM result for all of these are set to NEG (negative).
 +
 
 +
For each of the values that were entered into ''%hash'', the version is '0' for dates before 2012-06-28, and the version is '1' otherwise.
 +
 
 +
<pre>
 +
  my $version = '0';
 +
  if ($date < 20120628) { $version = '0'; }
 +
    else { $version = '1'; }
 +
</pre>
 +
 
 +
Enter into postgres cur_svmdata tables the paperID, datatype, svmdate, svm_result, version.
 +
 
 +
<pre>
 +
  push @pgcommands, qq(INSERT INTO cur_svmdata VALUES('$joinkey', '$type', '$date', '$flag', '$version'));
 +
</pre>
 +
 
 +
Any errors because of invalid datatypes, or SVM results that do not match a WBPaper, nor a paper.something get printed at the end of the output file (change this to email Daniela + Chris).
 +
 
 +
(TODO set cronjob to run every day at 4am ?)
 +
 
 +
<br>
 +
<br>
  
 
= Abbreviations =
 
= Abbreviations =
 
+
 
'''AFP''' - Author First Pass (flagging method)
+
'''AFP''' - Author First Pass (flagging method)
<br>
+
<br>
<br>
+
<br>
'''CFP''' - Curator First Pass (flagging method)
+
'''CFP''' - Curator First Pass (flagging method)
<br>
+
<br>
<br>
+
<br>
'''OA''' - Ontology Annotator (curation tool)
+
'''OA''' - Ontology Annotator (curation tool)
<br>
+
<br>
<br>
+
<br>
'''SVM''' - Support Vector Machine (flagging method)
+
'''STR''' - Textpresso String Matches (flagging method)
 +
<br>
 +
<br>
 +
'''SVM''' - Support Vector Machine (flagging method)
 +
<br>
 +
<br>
 +
 
 +
= Definitions =
 +
 
 +
'''curatable''' - A curatable paper is a paper for which : pap_status = 'valid' AND pap_primary_data = 'primary' AND NOT pap_curation_flags = 'non_nematode'
 +
<br>
 +
<br>
 +
'''curated''' - Any paper that has been curated, as determined by its presence in the OA or in cur_curdata. Note that if a paper is considered 'curated' it is automatically considered 'validated positive'
 +
<br>
 +
<br>
 +
'''cur_curdata''' - The Postgres table that captures, for a given paper-datatype pair, the information captured by this form, including validation status, premade comments, and free-text comments
 +
<br>
 +
<br>
 +
'''datatype''' - A class of data that WormBase curates
 +
<br>
 +
<br>
 +
'''flagged''' - Processed by a flagging method (flagged positive OR negative)
 +
<br>
 +
<br>
 +
'''flagging method''' - Manual or automated method for identifying research articles that contain a particular datatype
 +
<br>
 +
<br>
 +
'''validated''' - Definitively confirmed by a curator to have (or not have) the relevant datatype
 
<br>
 
<br>
 
<br>
 
<br>
  
= Definitions =
+
=Datatypes with no feedback into curation status form for curated papers=
 +
Variation  (seqchange) -> source flags through SVM and first_pass forms http://textpresso-dev.caltech.edu/transgene/transgenes_in_regular_papers.out
  
 
+
=Datatypes not in curation status form=
'''curated''' - Any paper that has been curated, as determined by its presence in the OA or in cur_curdata. Note that if a paper is considered 'curated' it is automatically considered 'validated positive'
+
Transgene, this has no svm, but is on first pass forms and papers are flagged based on string search, source of output -> http://textpresso-dev.caltech.edu/transgene/transgenes_in_regular_papers.out
<br>
 
<br>
 
'''cur_curdata''' - The Postgres table that captures, for a given paper-datatype pair, the information captured by this form, including validation status, premade comments, and free-text comments
 
<br>
 
<br>
 
'''datatype''' - A type of data of that WormBase curates
 
<br>
 
<br>
 
'''flagged''' - Processed by a flagging method (flagged positive OR negative)
 
<br>
 
<br>
 
'''flagging method''' - Manual or automated method for identifying research articles that contain a particular datatype
 
<br>
 
<br>
 
'''validated''' - Definitively confirmed by a curator to have (or not have) the relevant datatype
 
<br>
 
<br>
 

Latest revision as of 23:32, 12 May 2017

Curation Status & Statistics Form (2012)

The live form (on Tazendra) can be found here

The sandbox/testing form can be found here


The CGI code is located on Tazendra/Mangolassi here:

/home/postgres/public_html/cgi-bin/curation_status.cgi



Contents

User Guide

Main Page

Curation Status Form Main Page 11-8-2013.png


Above is a screenshot of the main page of the Curation Status Form. The user/curator is requested to identify who they wish to login as, and to select one of four options to continue:

1) Specific Paper Page - This is where the curator can specify one or more specific papers they wish to view curation status results for (see below). This page includes a Topic paper filter to search for papers related to a WormBase Biological Topic.

2) Add Results Page - This is where the curator can add curation status results for one or more specific papers (see below).

3) Curation Statistics Page - This is where the curator can view all curation statistics for ALL datatypes and ALL flagging methods (see below).

4) Curation Statistics Options Page - As an alternative to viewing the curation statistics for ALL datatypes and ALL flagging methods (as with option #3 above), this is where the curator can specify for which datatypes and flagging methods they would like to see curation statistics (see below).




Specific Paper Page

Curation Status Form Specific Paper Page 11-8-2013.png


Above is a screenshot of the Specific Paper Page where a curator can specify which paper(s) they would like to view curation status results for. After typing/pasting in one or more WBPaper IDs in the paper entry field, the curator can specify which datatypes and flagging methods they would like to see results for. Note that selecting "all datatypes" will override any single datatype selections below. A curator can select what curation data sources they would like to see results for (i.e. Ontology Annotator and/or cur_curdata), flagging methods (SVM, AFP, CFP, STR[textpresso string matches]), the number of papers they would like to load at one time (default of 10), and whether they would like to see info (and links) for the PubMed ID (PMID), the PDF, and the paper's journal.

Once a curator clicks on "Get Results", they will be directed to the Detailed Results of Papers Page, where they can view the results of their query.

Papers can be listed as WBPaper### or simply as numbers, separated by spaces, commas, pipes, new lines, anything that is not a number. Any number that is entered will be considered a valid paper ID.

Recent addition as of November 2013 CG 11-6-2013 A Topic dropdown menu has been added to the "Specific Paper Page" so as to allow curators to view all papers related to a particular curation topic with respect to their data type. Note that selecting a Topic will look for overlap with any WBPaper IDs entered into the main paper entry field, only populating the form with papers associated with the Topic AND in the paper entry field. Topic papers will be pulled from the Topic Curation OA. If no papers are entered in the paper entry field AND no topic is selected, the form will return ALL papers (having undergone at least one flagging pipeline).




Add Results Page

Curation Status Form Add Results Page2.png

Above is a screenshot of the Add Results Page of the form, where a curator can add new curation status results for one or more papers that they specify. A curator must specify what datatype they wish to submit paper results for and must specify what the status is for the paper(s): curated and (hence) positive, validated postive (but not yet curated), validated negative, or (if they need to revert back to not validated, or blank, status) not validated. The curator must then also specify at least one paper for which to apply this curation status in the paper entry field. Multiple papers must be entered as WBPaper### format and each on a separate line.

Optionally, a curator can select a pre-made comment from a drop down menu and/or enter a free-text comment. Once the curator clicks "Add Results", they will be directed to a New Results Summary Page:


Curation Status Form Submission Summary2.png


If the results are overwriting existing results, they will be directed to an Overwrite Confirmation Page:


Curation Status Form Confirm Overwrite.png


at which point the curator can confirm the overwrite of the previous results for the indicated paper and datatype, or simply go the main page (or go back a page to make corrections/edits). Note that the fields for which data has changed are highlighted in yellow for easy viewing. If the curator confirms the overwrite by checking the confirmation check box and clicking on "Overwrite Selected Results", they will be directed to the Overwrite Confirmation Summary Page:

Curation Status Form Overwrite Conf Summary.png

A link is provided to go back to the main page of the form.




Main Curation Statistics Page

Curation Status Form Curation Statistics Page.png


Above is a screenshot of a portion of the entire Curation Statistics table that a curator would be directed to from the main page of the form if they had clicked on the Curation Statistics Page button. Displayed at the top of the table are general paper statistics for a given datatype (datatypes indicated at the top of each column). Below that are statistics for papers that have been flagged (positive or negative) for the indicated datatype by ANY (at least one) flagging method. Below the "Any" statistics are the "Intersection" statistics, indicating papers flagged by ALL flagging methods for the indicated datatype. It should be emphasized here that "flagged" means processed by the flagging method, not necessarily flagged positive. Although not visible in the above screenshot, statistics for SVM results, AFP results, CFP results, and STR results are also included in this table.

The "Any", "Intersection", and individual flagging method sections of the table each follow a general template:

Flagged
   Flagged Positive
      Flagged Positive and Validated
         Flagged Positive, Validated False Positive
         Flagged Positive, Validated True Positive
            Flagged Positive, Validated True Positive, Curated
            Flagged Positive, Validated True Positive, Not Curated
      Flagged Positive, Not Validated
      Flagged Positive, Not Curated

and the individual flagging method sections additionally have a section for flagged negatives:

   Flagged Negative
      Flagged Negative and Validated
         Flagged Negative, Validated True Negative
         Flagged Negative, Validated False Negative
            Flagged Negative, Validated False Negative, Curated
            Flagged Negative, Validated False Negative, Not Curated
      Flagged Negative, Not Validated
      Flagged Negative, Not Curated

Each row title/header can be clicked on to bring up a small pop-up window with a brief description of what each title means. Each cell of the table has numbers indicating the number of papers that fit the criteria for that datatype and flag status, and the percentage (to two significant digits) that represents of a subset of some larger set. Each percentage is calculated, generally, as follows:

Flagged (% of curatable papers)
   Flagged Positive (% flagged)
      Flagged Positive and Validated (% flagged positive)
         Flagged Positive, Validated False Positive (% flagged positive and validated)
         Flagged Positive, Validated True Positive (% flagged positive and validated)
            Flagged Positive, Validated True Positive, Curated (% flagged positive and validated true positive)
            Flagged Positive, Validated True Positive, Not Curated (% flagged positive and validated true positive)
      Flagged Positive, Not Validated (% flagged positive)
      Flagged Positive, Not Curated (% flagged positive)

Each cell number (aside from the top three rows) is also a hyperlink to the Prepopulated Specific Papers Page, listing the paper IDs for each paper in the list, as well as providing options for the view of each of those papers in the Detailed Results of Papers Page.

NB: for STR data all the curatable papers are considered flagged. So the 'any flagged' number can be misleading. CAVEAT for when pulling out statistics.

Curation Statistics Page Display Info

The title/headers for each row are displayed at the left AND right sides of the table, to enable easier viewing when there are several datatypes being viewed at once. If the number of datatypes is restricted via the Curation Statistics Options Page, the titles/headers for each row will only display on both left and right sides of the table if more than six datatypes are selected for viewing (this was done to avoid overcrowding of the page when six or fewer datatypes/columns were visible).


The row-title column (leftmost column and, when more than six datatypes are visible, the rightmost column) are set to display at a fixed width of 600 pixels to allow all titles to fit on a single line. All other columns (datatype columns) are set to display at a fixed width of 120 pixels.




Curation Statistics Options Page

Curation Status Form Curation Statistics Options Page.png


Above is a screenshot of the Curation Statistics Options Page, where a curator can specify what flagging methods and datatypes they would like to see curation statistics for. (Note 1) Loading a table with fewer datatypes or flagging methods is often much faster than loading the entire table. Whereas loading the entire table (with ALL datatypes and ALL flagging methods) takes roughly 23 seconds to load, loading a table with ALL flagging methods but ONE datatype will usually only take 1-3 seconds to load. (Note 2) If 6 or fewer datatypes are requested, the row titles/headers will only appear on the left side of the table; if more than 6 datatypes are viewed at once, the row titles/headers will appear on both the left AND the right sides of the table. (Note 3) The "Any" and "Intersection" rows of the resulting table will only show results for the flagging methods that you have selected to view. If only one flagging method is selected to view, the "Any" and "Intersection" rows will be identical to each other and to the "flagged positive" rows for the single flagging method.




Prepopulated Specific Papers Page

Curation Status Form Prepopulated Specific Papers Page 11-8-2013.png


Above is a screenshot of the Prepopulated Specific Papers Page. Curators are directed here from any Curation Statistics table via the hyperlinked numbers in the statistics table. The entire list of paper IDs (that fit the criteria indicated in the table row/column of the statistics table) is listed as hyperlinks to the individual paper results (on the Detailed Results of Papers Page), as well as in the search box. Note that this page is identical to the Specific Paper Page, except that paper IDs are already pre-populated into the form from the statistics table.

An addition as of November 2013 is the Topic paper filtering drop down menu. The drop down menu provides a list of WormBase Biological Topics as read from the Topic Curation OA. If a topic is selected (e.g. 'Aging' in the example screenshot), the form will look for any papers that exist in BOTH the Topic paper list AND the list of papers entered into the paper entry field (in this case prepopulated from some prior filtering step). If there are no overlapping papers from both lists, the form will not return any papers. If a Topic is not selected, the form will simply return the papers as listed in the paper entry field. If no topic is selected AND there are no papers in the paper entry field, the form will return ALL papers (having undergone at least one flagging pipeline).





Detailed Results of Papers Page

Curation Status Form Detailed Results of Papers Page.png


Above is a screenshot of the Detailed Results of Papers Page, where curators can view and edit the curation status of individual papers as well as view the the flagging results of each flagging method for individual papers.

When ALL columns are visible:

the first column displays the WormBase paper ID (WBPaperID#);

the second column displays the name of the journal for the publication;

the third column lists the PubMed ID (PMID) and links out to the PubMed webpage for this article;

the fourth column displays the name of the PDF file stored locally on Tazendra with a hyperlink to the PDF file of the article in our local PDF archives;

the fifth column displays the datatype for that row (note that multiple datatypes may be displayed per paper, on separate rows);

the sixth, seventh, eighth, and ninth columns display the results of the flagging methods SVM, STR, CFP, and AFP, respectively;

the tenth column indicates the status of the paper in the Ontology Annotator (OA) for that datatype indicating "oa_blank" if the paper does not exist in the respective OA, or "curated" if the paper does exist in the respective OA, indicating that it has been curated (or at least partially curated); NOTE : A paper in the RNAi OA with every entry flagged with the "NO DUMP" toggle will appear as "oa_blank" in the Curation Status Form; a single row with a paper that does not have a "NO DUMP" toggle turned on will indicate a "curated" status in the Curation Status Form

the eleventh column provides a drop-down menu to select a curator (selecting a curator is only necessary if overwriting/changing the existing curator; the form recognizes what curator is logged in and automatically populates this field with the correct (logged in) curator if this field is blank);

the twelfth column provides a drop-down menu to select the "new result" for the paper, indicating whether it is "curated and positive", "validated positive", "validated negative", or "not validated" ("not validated" only needs to be selected when reverting back from "curated and positive", "validated positive", or "validated negative" entries that may have been entered accidentally; selecting this option will result in a blank field once the change has been submitted through the form)

the thirteenth column provides the drop-down menu of standard, premade comments

the fourteenth (and final) column is a free-text area where a curator can write in any pertinent notes about the curation status of this paper-datatype pair

If any new results for a paper are entered (in columns 10-13), the curator must click on the "Submit New Results" button at the bottom of the screen, at which point they will either be directed to the New Results Summary page or to the Overwrite Confirmation Page, as shown above in the Wiki section describing the Add Results Page. Note that in order for new paper result submissions to take effect, there must be a value in the "new results" column. Otherwise any comments (premade or free-text) will not be registered with the paper.

Conflicts

A paper that is found to have mutually exclusive flags triggers the papers to be flagged as a "conflict". These papers will not appear in any other lists (e.g. "curated", "validated", etc.) and need to be resolved before they enter back into a normal list. A conflict can be triggered if:

  • A paper is found to be curated by the OA, but has been flagged, via the form, to be "validated negative". This raises a conflict which needs a curator to check if the curation is bogus or if the "validated negative" flag was a mistake. This can be resolved by changing the "validated negative" status to any other status ("not validated" (blank), "validated positive", or "curated and positive") if the paper was mistakenly flagged as "validated negative", OR by fixing the curation in the OA (delete bogus curation or fix the paper reference).

A paper will not be considered in conflict if the OA status indicates "oa blank" and is flagged as "curated and positive". This will be the case, for example, for all of the large scale papers whose annotations do not reside in the OA/Postgres.


Topic-Paper Filter

(In progress... CG 11-5-2013)

The Curation Status Form will now provide an option to filter a list of papers based on a WormBase Biological Topic. Official topics and affiliated papers are recognized from the "Topic" OA.




Code Documentation

Below is the documentation for the form's code, located on Tazendra (when live) or Mangolassi (sandbox):

/home/postgres/public_html/cgi-bin/curation_status.cgi


Specific Paper Page/ Prepopulated Specific Paper Page

The following code prints the "Specific Paper Page":

sub printSpecificPaperPage {
  &printFormOpen();
  &printHiddenCurator();
  &printTextareaSpecificPapers('');
  &printSelectTopics();
  &printCheckboxesDatatype('off');
  &printCheckboxesCurationSources('all');
  &printPaperOptions();
  &printSubmitGetResults();
  &printFormClose();
} # sub printSpecificPaperPage


The following code prints the "Prepopulated Specific Papers Page":

sub listCurationStatisticsPapersPage {
  &printFormOpen();
  &printHiddenCurator();
  my ($papers) = &printListCurationStatisticsPapers();
  &printTextareaSpecificPapers($papers);
  &printSelectTopics();
  &printSubmitGetResults();
  ($oop, my $listDatatype) = &getHtmlVar($query, "listDatatype");
  &printCheckboxesDatatype($listDatatype);
  &printCheckboxesCurationSources('all');
  &printPaperOptions();
  &printSubmitGetResults();
  &printFormClose();
} # sub listCurationStatisticsPapersPage


Paper-Topic Filter

On both the Specific Paper Page and the Prepopulated Specific Papers Page, we have the option to filter the papers listed in the WBPaper ID field based on their affiliation to a particular WormBase Biological Topic, as informed by the Topic Curation OA.


The following code is responsible for displaying the drop down menu of Topics:

sub printSelectTopics {
  print qq(Filter papers from list through a topic :<br/>);
  print qq(<select name="select_topic">);
  print qq(<option value="none">no topic, use all papers from textarea above</option>\n);
  my %topicIDs; my %topicIdToName;
  $result = $dbh->prepare( "SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process" );
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
  while (my @row = $result->fetchrow) { $topicIDs{$row[0]}++; }
  my $topicIDs = join"','", sort keys %topicIDs;                # for all the topicIDs, get the name from the prt_processname
  $result = $dbh->prepare( "SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')" );
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
  while (my @row = $result->fetchrow) { $topicIdToName{$row[0]} = $row[1]; }    # map wbprocess ids to their names for dropdown display
  foreach my $topic (sort keys %topicIdToName) { print qq(<option value="$topic $topicIdToName{$topic}">$topic $topicIdToName{$topic}</option>); }
  print qq(</select><br/>);
} # sub printSelectTopics

The above code incorporates two Postgres queries:

First, gets the WBProcessIDs that are in the topic curation OA and have a paper and the status is 'relevant':

SELECT DISTINCT(pro_process.pro_process) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') ORDER BY pro_process.pro_process

Second, the WBProcess IDs go into the variable $topic, and for each of those WBProcess IDs we get the corresponding name from the process Term OA:

SELECT prt_processid.prt_processid, prt_processname.prt_processname FROM prt_processid, prt_processname WHERE prt_processid.joinkey = prt_processname.joinkey AND prt_processid.prt_processid IN ('$topicIDs')

then we have a dropdown of process IDs ordered by ID, with the human readable process name next to it




Add Results Page: Loading Page and Processing Input

The following code is responsible for printing the Add Results Page:

sub printAddResultsPage {
  &printAddSection('', '', '', '', '', '', '');
} # sub printAddResultsPage

This code defines the printAddResultsPage subroutine which in turn calls upon the printAddSection subroutine (see below), passing it empty strings, declaring empty/initialized values for the curator ($twonumForm), datatype ($datatypeForm), validation result ($donposnegForm), paper list ($paperResultsForm), premade comment ($selcommentForm), and free-text comment ($txtcommentForm). Once data is submitted, these variables acquire values and are reported in the event of an error.


printAddSection Subroutine

The following code defines the printAddSection subroutine mentioned above, which adds the form components for entering datatype, validation status, paper IDs, premade comments, and free-text comments:

sub printAddSection {
  my ($twonumForm, $datatypeForm, $donposnegForm, $paperResultsForm, $selcommentForm, $txtcommentForm) = @_;
  my $selected = '';
  &printFormOpen();
  &printHiddenCurator();
  print qq(Select your datatype :<br/>);
  print qq(<select name="select_datatype">);
  print qq(<option value=""             ></option>\n);
  foreach my $datatype (keys %datatypes) {
    if ($datatype eq $datatypeForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
    print qq(<option value="$datatype" $selected>$datatype</option>\n); }
  print qq(</select><br/>);
  print qq(Select if the data is positive or negative :<br/>);
  my $select_size = scalar keys %donPosNegOptions;
  print qq(<select name="select_donposneg" size="$select_size">);
  foreach my $donposnegValue (keys %donPosNegOptions) {
    if ($donposnegForm eq $donposnegValue) { $selected = qq(selected="selected"); } else { $selected = ''; }
    print qq(<option value="$donposnegValue" $selected>$donPosNegOptions{$donposnegValue}</option>\n); }
  print qq(</select><br/>);
  print qq(Enter paper data here in the format "WBPaper00001234" (paper as a whole) with separate papers in separate lines.<br/>);
  print qq(<textarea name="textarea_paper_results" rows="6" cols="80">$paperResultsForm</textarea><br/>\n);
  print qq(Select your comment (optional) :<br/>);
  print qq(<select name="select_comment">);
  print qq(<option value=""             ></option>\n);
  foreach my $comment (keys %premadeComments) {
    if ($comment eq $selcommentForm) { $selected = qq(selected="selected"); } else { $selected = ''; }
    print qq(<option value="$comment" $selected>$premadeComments{$comment}</option>\n); }
  print qq(</select><br/>);
  print qq(Enter a free text comment to associate with all papers above (optional) :<br/>);
  print qq(<textarea rows="4" cols="80" name="textarea_comment">$txtcommentForm</textarea><br/>);
  print qq(<input type="submit" name="action" value="Add Results"><br/>\n);
  &printFormClose();
} # sub printAddSection


addResults Subroutine

When a curator clicks on "Add Results" on the Add Results Page, the following code will process the curator's input, catching errors when they arise:

sub addResults {
  &printFormOpen();
  &printHiddenCurator();
  my $errorData = '';
  my %papersToAdd;
  my $twonum = $curator;
  ($oop, my $datatype) = &getHtmlVar($query, "select_datatype");
  unless ($datatype) { $errorData .= "Error : Need to select a datatype.<br/>\n"; }
  ($oop, my $donposneg) = &getHtmlVar($query, "select_donposneg");
  unless ($donposneg) { $errorData .= "Error : Need to select whether result is curated, validated positive, or validated negative.<br/>\n"; }
  ($oop, my $paperResults) = &getHtmlVar($query, "textarea_paper_results");
  if ($paperResults) {
      my @lines = split/\r\n/, $paperResults;
      foreach my $line (@lines) {
        if ($line =~ m/^WBPaper(\S+)$/) { $papersToAdd{$1}++; }
         else { $errorData .= qq(Error bad line : ${line}<br/>\n); }
      } } # foreach my $line (@lines)
    else { $errorData .= "Error : Need to enter at least one paper.<br/>\n"; }
  ($oop, my $selcomment) = &getHtmlVar($query, "select_comment");
  ($oop, my $txtcomment) = &getHtmlVar($query, "textarea_comment");
  if ($errorData) {                             # problem with data, do not allow creation of any data, show form again
      print "$errorData<br />\n";
      printAddSection($twonum, $datatype, $donposneg, $paperResults, $selcomment, $txtcomment); }
    else {                                      # all data is okay, enter data.
      my $joinkeys = join"','", sort keys %papersToAdd;
      my ($pgDataRef) = &getPgDataForJoinkeys($joinkeys, $datatype);
      my %pgData = %$pgDataRef;

      my @data; my @duplicateData;
      foreach my $joinkey (sort keys %papersToAdd) {
          my @line;
          push @line, $joinkey;
          push @line, $datatype;
          push @line, $twonum;
          push @line, $donposneg;
          push @line, $selcomment;
          push @line, $txtcomment;
          if ($pgData{$joinkey}{$datatype}) { push @duplicateData, \@line; }
            else { push @data, \@line; }
      } # foreach my $joinkey (sort keys %papersToAdd)
      &processResultDataDuplicateData(\@data, \@duplicateData, \%pgData);
    } # else # if ($errorData)
  &printFormClose();
} # sub addResults


This code (above) will check to ensure the following:

1) A datatype has been selected

2) A validation status has been entered

3) At least one paper ID has been submitted

4) There is only one paper ID per line

5) The paper IDs entered are in the format 'WBPaper########'

Any exceptions to these will result in an error message printed to the screen, in addition to reprinting the screen with the submitted values in there respective fields.

If no errors are found, the script will continue by calling on the getPgDataForJoinkeys subroutine (to query Postgres for the cur_curdata associated with each paper-datatype pair; see below) and writing the new input values to the appropriate paper-datatype pairs.


sub getPgDataForJoinkeys {
  my ($joinkeys, $datatype) = @_;
  my %pgData;
  $result = $dbh->prepare( "SELECT * FROM cur_curdata WHERE cur_datatype = '$datatype' AND cur_paper IN ('$joinkeys')" );
  $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
  while (my @row = $result->fetchrow) {
    $pgData{$row[0]}{$row[1]}{curator}     = $row[2];
    $pgData{$row[0]}{$row[1]}{donposneg}   = $row[3];
    $pgData{$row[0]}{$row[1]}{selcomment}  = $row[4];
    $pgData{$row[0]}{$row[1]}{txtcomment}  = $row[5];
    $pgData{$row[0]}{$row[1]}{timestamp}   = $row[6]; }
  return \%pgData;
} # sub getPgDataForJoinkeys


processResultDataDuplicateData Subroutine

The code will then run the processResultDataDuplicateData subroutine (see below) to print the results to the screen as the New Results Summary Page (for new data) and/or handle the overwrite confirmation when data needs to be overwritten, generating the Overwrite Confirmation Page (see Add Results Page section above).


sub processResultDataDuplicateData {
  my ($dataRef, $duplicateDataRef, $pgDataRef) = @_;
  my @data          = @$dataRef;
  my @duplicateData = @$duplicateDataRef;
  my %pgData        = %$pgDataRef;
  print qq(<table border="1">\n);
  print qq(<tr>${thDot}paperId</td>${thDot}datatype</td>${thDot}curator</td>${thDot}value</td>${thDot}selcomment</td>${thDot}textcomment</td></tr>\n);
  foreach my $lineRef (@data) {
    my @line = @$lineRef;
    foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
    my $pgvalues = join"','", @line;
    my @pgcommands = ();
    my $pgcommand = "INSERT INTO cur_curdata VALUES ('$pgvalues');";
    push @pgcommands, $pgcommand;
    $pgcommand = "INSERT INTO cur_curdata_hst VALUES ('$pgvalues');";
    push @pgcommands, $pgcommand;
    foreach my $pgcommand (@pgcommands) {
      print qq($pgcommand<br/>\n);
# UNCOMMENT TO POPULATE
      $dbh->do( $pgcommand );
    }
    my $trData = join"</td>$tdDot", @line;
    print qq(<tr>${tdDot}$trData</td></tr>\n);
  } # foreach my $lineRef (@data)
  print qq(</table>\n);
  if (scalar @data > 0) { print "results added<br />\n"; }


The first section of the subroutine (above) processes all data submitted through the Add Results Page that is not overwriting any existing data. The new data is submitted to Postgres and a table is printed to the screen as the New Results Summary Page.

The next sections of the subroutine handle new results that will overwrite existing Postgres values for the paper-datatype pairs. First, the code determines what data already exists in Postgres for a given paper-datatype pair and stores these in $xxxxxPg variables. Then, the code determines what values for curator, validation status, premade comment, and free-text comment were submitted through the form and stores these in $xxxxxFm variables. Next, the code generates and displays a table for each paper-datatype pair that has values being overwritten (if the corresponding $xxxxxPg and $xxxxxFm variables are not equal) with each set of data (old and new) highlighted in yellow to draw attention to these for overwrite confirmation. A confirmation checkbox is displayed for each paper-datatype pair undergoing an overwrite.

  my $overwriteCount = 0;
  foreach my $lineRef (@duplicateData) {                # for data already in postgres, add option to overwrite
    my @line = @$lineRef;
    foreach (@line) { unless ($_) { $_ = ''; } }        # initialize values if none are there
    my ( $joinkey, $datatype, $twonum, $donposneg, $selcomment, $txtcomment ) = @line;
    my ( $curatorPg, $curatorPgName, $donposnegPg, $selcommentPg, $selcommentPgText, $txtcommentPg, $timestampPg ) = ( ' ', ' ', ' ', ' ', ' ', ' ', ' ' );
    my ( $curatorFm, $curatorFmName, $donposnegFm, $selcommentFm, $selcommentFmText, $txtcommentFm, $timestampFm ) = ( ' ', ' ', ' ', ' ', ' ', ' ', '<td> </td>' );
    if ( $pgData{$joinkey}{$datatype}{curator}    ) { $curatorPg    = $pgData{$joinkey}{$datatype}{curator};    $curatorPgName = $curators{$curatorPg}; }
    if ( $pgData{$joinkey}{$datatype}{donposneg}  ) { $donposnegPg  = $pgData{$joinkey}{$datatype}{donposneg};  }
    if ( $pgData{$joinkey}{$datatype}{selcomment} ) { $selcommentPg = $pgData{$joinkey}{$datatype}{selcomment}; $selcommentPgText = $premadeComments{$selcommentPg}; }
    if ( $pgData{$joinkey}{$datatype}{txtcomment} ) { $txtcommentPg = $pgData{$joinkey}{$datatype}{txtcomment}; }
    if ( $pgData{$joinkey}{$datatype}{timestamp}  ) { $timestampPg  = "<td>$pgData{$joinkey}{$datatype}{timestamp};</td>"  }
    if ( $twonum ) { $curatorFm = $twonum;
      if ( $curators{$curatorFm} ) { $curatorFmName   = $curators{$curatorFm}; } }
    if ( $donposneg )         { $donposnegFm = $donposneg; }
    if ( $selcomment ) { $selcommentFm = $selcomment;
      if ( $premadeComments{$selcommentFm} ) { $selcommentFmText = $premadeComments{$selcommentFm}; } }
    if ( $txtcomment ) { $txtcommentFm = $txtcomment; }
    my $isDifferent = 0;                                # if any of the non-key values has changed, show option to overwrite
    if ($curatorFmName    ne $curatorPgName) {
        $isDifferent++;
        $curatorFmName = '<td style="background-color:yellow">' . $curatorFmName . '</td>';
        $curatorPgName = '<td style="background-color:yellow">' . $curatorPgName . '</td>'; }
      else {
        $curatorFmName = '<td>' . $curatorFmName . '</td>';
        $curatorPgName = '<td>' . $curatorPgName . '</td>'; }
    if ($donposnegFm  ne $donposnegPg) {
        $isDifferent++;
        $donposnegFm = '<td style="background-color:yellow">' . $donposnegFm . '</td>';
        $donposnegPg = '<td style="background-color:yellow">' . $donposnegPg . '</td>'; }
      else {
        $donposnegFm = '<td>' . $donposnegFm . '</td>';
        $donposnegPg = '<td>' . $donposnegPg . '</td>'; }
    if ($selcommentFmText ne $selcommentPgText) {
        $isDifferent++;
        $selcommentFmText = '<td style="background-color:yellow">' . $selcommentFmText . '</td>';
        $selcommentPgText = '<td style="background-color:yellow">' . $selcommentPgText . '</td>'; }
      else {
        $selcommentFmText = '<td>' . $selcommentFmText . '</td>';
        $selcommentPgText = '<td>' . $selcommentPgText . '</td>'; }
    if ($txtcommentFm ne $txtcommentPg) {
        $isDifferent++;
        $txtcommentFm = '<td style="background-color:yellow">' . $txtcommentFm . '</td>';
        $txtcommentPg = '<td style="background-color:yellow">' . $txtcommentPg . '</td>'; }
      else {
        $txtcommentFm = '<td>' . $txtcommentFm . '</td>';
        $txtcommentPg = '<td>' . $txtcommentPg . '</td>'; }
    next unless ($isDifferent > 0);
    $overwriteCount++;
    print qq(<input type="hidden" name="joinkey_$overwriteCount"       value="$joinkey"  >);
    print qq(<input type="hidden" name="datatype_$overwriteCount"      value="$datatype" >);
    print qq(<input type="hidden" name="twonum_$overwriteCount"        value="$twonum"   >);
    print qq(<input type="hidden" name="donposneg_$overwriteCount"     value="$donposneg"   >);
    print qq(<input type="hidden" name="selcomment_$overwriteCount"    value="$selcomment"  >);
    print qq(<input type="hidden" name="txtcomment_$overwriteCount"    value="$txtcomment"  >);
    print qq(WBPaper$joinkey $datatype : <br/>\n);
    print qq(<table border="1">\n);
    print qq(<tr><th> </th><th>curator</th><th>value</th><th>selcomment</th><th>txtcomment</th><th>timestamp</th></tr>);
    print qq(<tr><td>old</td>${curatorPgName}${donposnegPg}${selcommentPgText}${txtcommentPg}${timestampPg}</tr>\n);
    print qq(<tr><td>new</td>${curatorFmName}${donposnegFm}${selcommentFmText}${txtcommentFm}${timestampFm}</tr>\n);
    print qq(</table>\n);
    print qq(Confirm change <input type="checkbox" name="checkbox_$overwriteCount" value="overwrite"><br/><br/>\n);
  } # foreach my $lineRef (@data)
  if ($overwriteCount > 0) {
    print qq(<input type="hidden" name="overwrite_count" value="$overwriteCount">);
    print qq(<input type="submit" name="action" value="Overwrite Selected Results"><br/>\n); }
} # sub processResultDataDuplicateData


Once the curator has confirmed the overwrite of the relevant results and clicked on the "Overwrite Selected Results", the overwriteSelectedResults subroutine (below) will run to officially overwrite the data in the Postgres cur_curdata table.

sub overwriteSelectedResults {
  ($oop, my $overwriteCount) = &getHtmlVar($query, "overwrite_count");
  my @pgcommands;
  for my $i (1 .. $overwriteCount) {
    ($oop, my $overwrite) = &getHtmlVar($query, "checkbox_$i");
    next unless ($overwrite eq 'overwrite');
    ($oop, my $joinkey    ) = &getHtmlVar($query, "joinkey_$i"    );
    ($oop, my $datatype   ) = &getHtmlVar($query, "datatype_$i"   );
    ($oop, my $twonum     ) = &getHtmlVar($query, "twonum_$i"     );
    ($oop, my $donposneg  ) = &getHtmlVar($query, "donposneg_$i"  );
    ($oop, my $selcomment ) = &getHtmlVar($query, "selcomment_$i" );
    ($oop, my $txtcomment ) = &getHtmlVar($query, "txtcomment_$i" );
    unless ($donposneg) { $donposneg = ''; } unless ($selcomment) { $selcomment = ''; } unless ($txtcomment) { $txtcomment = ''; }
    push @pgcommands, qq(DELETE FROM cur_curdata WHERE cur_paper = '$joinkey' AND cur_datatype = '$datatype' AND cur_curator = '$twonum');
    push @pgcommands, qq(INSERT INTO cur_curdata VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
    push @pgcommands, qq(INSERT INTO cur_curdata_hst VALUES ('$joinkey', '$datatype', '$twonum', '$donposneg', '$selcomment', '$txtcomment'));
  } # for my $i (1 .. $overwriteCount)
  foreach my $pgcommand (@pgcommands) {
    print "$pgcommand<br />\n";
# UNCOMMENT TO POPULATE
    $dbh->do( $pgcommand );
  } # foreach my $pgcommand (@pgcommands)
} # sub overwriteSelectedResults


Detailed Results of Papers Page: Loading Data (the getResults Subroutine)

The following code is for the getResults subroutine which displays the Detailed Results of Papers Page after receiving input from the curator about what to display (from a Specific Paper Page or Prepopulated Specific Papers Page submission).

The first section of code collects all curatable papers, processes the checkbox input for the various datatypes, and processes the paper IDs that were submitted in the paper field.

sub getResults {
  &printFormOpen();
  &printHiddenCurator();
  &populateCuratablePapers();                   # assume for now that we only care about curatable papers

  ($oop, my $all_datatypes_checkbox) = &getHtmlVar($query, "checkbox_all_datatypes");
  unless ($all_datatypes_checkbox) { $all_datatypes_checkbox = ''; }
  foreach my $datatype (keys %datatypes) {
    ($oop, my $chosen) = &getHtmlVar($query, "checkbox_$datatype");
    unless ($chosen) { $chosen = ''; }
    if ($all_datatypes_checkbox eq 'all') { $chosen = $datatype; }      # if all datatypes checkbox was selected, set that datatype's chosen to that datatype
    print qq(<input type="hidden" name="checkbox_$datatype" value="$chosen">\n);
    if ($chosen) { $chosenDatatypes{$chosen}++; }
  } # foreach my $datatype (keys %datatypes)

  ($oop, my $specificPapers) = &getHtmlVar($query, "specific_papers");
  my %filterPapers; my %specificPapers; my %topicPapers;
  if ($specificPapers) { my (@joinkeys) = $specificPapers =~ m/(\d+)/g; foreach (@joinkeys) { $specificPapers{$_}++; } }
  ($oop, my $topic)          = &getHtmlVar($query, "select_topic");     # if there's a selected topic replace specific papers with those from topic
  unless ($topic) { $topic = 'none'; }
  if ($topic ne 'none') {
    print "using topic $topic<br/>\n";
    my ($topicID) = $topic =~ m/(WBbiopr:\d+)/;                         # get the WBProcessID from the topic which includes the name
    print qq(<input type="hidden" name="select_topic" value="$topic">\n);
    $result = $dbh->prepare( "SELECT DISTINCT(pro_paper.pro_paper) FROM pro_process, pro_paper, pro_topicpaperstatus WHERE pro_process.joinkey = pro_paper.joinkey AND pro_process.joinkey = pro_topicpaperstatus.joinkey AND (pro_topicpaperstatus.pro_topicpaperstatus = 'relevant') AND pro_process.pro_process = '$topicID'" );
    $result->execute() or die "Cannot prepare statement: $DBI::errstr\n";
    while (my @row = $result->fetchrow) { $row[0] =~ s/WBPaper//; $topicPapers{$row[0]}++; }
  } # if ($topic ne 'none')
  if ($specificPapers && ($topic ne 'none')) {
      foreach (sort keys %specificPapers) { if ($topicPapers{$_}) { $chosenPapers{$_}++; } } }
    elsif ($specificPapers) {
      foreach (sort keys %specificPapers) { $chosenPapers{$_}++; } }
    elsif ($topic ne 'none') {
      foreach (sort keys %topicPapers) { $chosenPapers{$_}++; } }
    else { $chosenPapers{'all'}++; }  
  print qq(<input type="hidden" name="specific_papers" value="$specificPapers">\n);

The above code looks at the Topic Curation OA for rows where a paper matches the selected topic, the status is 'relevant', and gets the associated WBPapers

How filtering works: There are two lists of papers: (1) papers in the text area box, (2) papers for the topic (from the Topic Curation OA) The filtering code looks for papers that exist in both lists and generates results to display for the filtered list of papers. If no papers are entered in the text area, the resulting list of papers will be whatever papers are affiliated with the Topic. If there are no papers affiliated with the topic, the resulting list will be whatever papers were entered into the paper text area. If there are no papers in either the text area or affiliated with the Topic, it will return all papers relevant to this form.


The next section of code populates cur_curdata for the respective paper-datatype pairs given the input from above. The code takes into account whether to show OA data and what flagging methods to display for each paper-datatype pair, to prepare them for display. Additionally, the code is now processing how many papers to display per page, setting the default page (if multiple pages of results) to see page "0", and determining whether or not the curator wishes to see the journal, PMID, and/or PDF links for each paper.

  &populateCurCurData();                                # always show curator values since they have to be editable

  ($oop, my $displayOa)  = &getHtmlVar($query, "checkbox_oa");    unless ($displayOa) {  $displayOa  = ''; }
  ($oop, my $displayCfp) = &getHtmlVar($query, "checkbox_cfp");   unless ($displayCfp) { $displayCfp = ''; }
  ($oop, my $displayAfp) = &getHtmlVar($query, "checkbox_afp");   unless ($displayAfp) { $displayAfp = ''; }
  ($oop, my $displaySvm) = &getHtmlVar($query, "checkbox_svm");   unless ($displaySvm) { $displaySvm = ''; }
  ($oop, my $displayStr) = &getHtmlVar($query, "checkbox_str");   unless ($displayStr) { $displayStr = ''; }
  print qq(<input type="hidden" name="checkbox_oa"  value="$displayOa" >\n);
  print qq(<input type="hidden" name="checkbox_cfp" value="$displayCfp">\n);
  print qq(<input type="hidden" name="checkbox_afp" value="$displayAfp">\n);
  print qq(<input type="hidden" name="checkbox_svm" value="$displaySvm">\n);
  print qq(<input type="hidden" name="checkbox_str" value="$displayStr">\n);
  if ($displayOa) {  &populateOaData();  }
  if ($displayCfp) { &populateCfpData(); }
  if ($displayAfp) { &populateAfpData(); }
  if ($displaySvm) { &populateSvmData(); }
  if ($displayStr) { &populateStrData(); }

  ($oop, my $showJournal) = &getHtmlVar($query, "checkbox_journal");   unless ($showJournal) { $showJournal = ''; }
  ($oop, my $showPmid)    = &getHtmlVar($query, "checkbox_pmid");      unless ($showPmid) {    $showPmid = '';    }
  ($oop, my $showPdf)     = &getHtmlVar($query, "checkbox_pdf");       unless ($showPdf) {     $showPdf = '';     }
  print qq(<input type="hidden" name="checkbox_journal" value="$showJournal">\n);
  print qq(<input type="hidden" name="checkbox_pmid"    value="$showPmid">\n);
  print qq(<input type="hidden" name="checkbox_pdf"     value="$showPdf">\n);

  ($oop, my $papersPerPage) = &getHtmlVar($query, "papers_per_page");
  ($oop, my $pageSelected)  = &getHtmlVar($query, "select_page");
  unless ($papersPerPage) { $papersPerPage = 10; }
  unless ($pageSelected) {  $pageSelected  = 0;  }
  print qq(<input type="hidden" name="papers_per_page" value="$papersPerPage">\n);

  my @headerRow = qw( paperID );
  if ($showJournal) { push @headerRow, "journal"; &populateJournal(); }
  if ($showPmid)    { push @headerRow, "pmid";    &populatePmid();    }
  if ($showPdf)     { push @headerRow, "pdf";     &populatePdf();     }


The code now generates a hash (%trs) of rows of results (not necessarily in the order submitted). Any paper that has data for the relevant datatype and flagging method, this data is then loaded into the appropriate column of each row. For each paper submitted and each datatype requested, the code will load the flagging results, cur_curdata, and OA (curation) data into the %allPaperData hash table. The code then loads, for each paper queried and datatype requested, the relevant results. The first column for any paper will display the WBPaper ID#. If the PMID, journal, and/or PDF link were requested to be displayed, they are displayed in the next columns for a given paper. Next is displayed the datatype column. In the next column (if SVM data was requested for view), the SVM data for the each paper-datatype pair for the paper are populated into the table, highlighting the "high", "medium", and "low" SVM results in decreasing intensities of red highlight, respectively. The next columns are populated with STR, CFP, AFP, and OA data (if requested) for each paper-datatype pair. Note that a blank field for STR, CFP and AFP are simply blank ("") whereas an empty result for OA data is represented by "oa_blank". Otherwise, the STR, CFP and AFP fields would be populated with free-text entries from the STR, CFP and AFP results and the OA field would indicate "curated" if data was found in the OA for the paper-datatype pair. In the next columns are displayed the curator drop-down menu, the validation status drop-down menu, the premade comment drop-down menu, and the free-text comment field. Any new results submitted via this page for a paper-datatype pair will automatically be attributed to the curator that is logged in, unless their is already a curator listed in the curator field or the curator explicitly selects a curator from the curator drop-down list. In the free-text comment field, if there are more than 20 characters stored, only the first 20 characters will be displayed followed by an ellipsis ("..."). Clicking inside the free-text field will open up the full view of the text and while editing will remain in full text view. Subsequent clicking outside of the text field will revert back to the truncated, ellipsis view to conserve screen space. Each cell in the table is outlined with a dotted line format.


  my %trs;                              # td data for each table row
  my %paperPosNegOkay;                  # papers that have positive-negative data okay, so show all svm results for that paper even if a given row isn't positive-negative okay
  my %paperInfo;                        # for a joinkey, all the paper information about it to show in a big rowspan for that table row

  my %allPaperData;                     # hash of datatype - joinkey  for all posible queried data structures, to key off from this when there are no svm results for a data structure with data.
  foreach my $datatype (keys %svmData) { foreach my $joinkey (keys %{ $svmData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  foreach my $datatype (keys %strData) { foreach my $joinkey (keys %{ $strData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  foreach my $datatype (keys %curData) { foreach my $joinkey (keys %{ $curData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  foreach my $datatype (keys %oaData)  { foreach my $joinkey (keys %{  $oaData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  foreach my $datatype (keys %cfpData) { foreach my $joinkey (keys %{ $cfpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }
  foreach my $datatype (keys %afpData) { foreach my $joinkey (keys %{ $afpData{$datatype} }) { $allPaperData{$joinkey}{$datatype}++; } }

  my $trCounter = 0;
  foreach my $joinkey (sort keys %curatablePapers) {                    # TODO curatablePapers or allPaperData that have some flag ?
    next unless ($chosenPapers{$joinkey} || $chosenPapers{all});

    push @{ $paperInfo{$joinkey} }, $joinkey;
    my $journal = ''; my $pmid = ''; my $pdf = ''; my $primaryData = '';
    if ($showJournal) {
      if ($journal{$joinkey}) { $journal = $journal{$joinkey}; }
      push @{ $paperInfo{$joinkey} }, $journal; }
    if ($showPmid) {
      if ($pmid{$joinkey}) { $pmid = $pmid{$joinkey}; }
      push @{ $paperInfo{$joinkey} }, $pmid; }
    if ($showPdf) {
      if ($pdf{$joinkey}) { $pdf = $pdf{$joinkey}; }
      push @{ $paperInfo{$joinkey} }, $pdf; }

    foreach my $datatype (sort keys %{ $allPaperData{$joinkey} }) {
      next unless ($chosenDatatypes{$datatype});                        # show only results for selected datatype
      my @dataRow = ( "$datatype" );
      $trCounter++;
      if ($displaySvm) {
        my $svmResult = '';
        if ($svmData{$datatype}{$joinkey}) { $svmResult = $svmData{$datatype}{$joinkey}; }
        my $bgcolor = 'white';
        if ($svmResult eq 'high')      { $bgcolor = '#FFA0A0'; }
        elsif ($svmResult eq 'medium') { $bgcolor = '#FFC8C8'; }
        elsif ($svmResult eq 'low')    { $bgcolor = '#FFE0E0'; }
        $svmResult = qq(<span style="background-color: $bgcolor">$svmResult</span>);
        push @dataRow, $svmResult;
      } # if ($displaySvm)
      
      if ($displayStr) {
        my $strResult = '';
        if ($strData{$datatype}{$joinkey}) { $strResult = $strData{$datatype}{$joinkey}; }
        push @dataRow, $strResult;
      }

      if ($displayCfp) {
        my $cfpResult = '';
        if ($cfpData{$datatype}{$joinkey}) { $cfpResult = $cfpData{$datatype}{$joinkey}; }
        push @dataRow, $cfpResult;
      }

      if ($displayAfp) {
        my $afpResult = '';
        if ($afpData{$datatype}{$joinkey}) { $afpResult = $afpData{$datatype}{$joinkey}; }
        push @dataRow, $afpResult;
      }

      if ($displayOa) {
        my $oaResult = 'oa_blank';
        if ($oaData{$datatype}{$joinkey}) { $oaResult = $oaData{$datatype}{$joinkey}; }
        push @dataRow, $oaResult;
      }

      my $thisCurator = '';                                                     # curator in cur_curdata for this paper-datatype if it has a value
      if ( $curData{$datatype}{$joinkey}{curator} ) { $thisCurator = $curData{$datatype}{$joinkey}{curator}; }
      my $curatorSelectCurator = qq(<select name="select_curator_curator_$trCounter" size="1">\n<option value=""></option>\n);
      foreach my $curator_two (keys %curators) {        # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
        if ($thisCurator eq $curator_two) { $curatorSelectCurator .= qq(<option value="$curator_two" selected="selected">$curators{$curator_two}</option>\n); }
          else {                            $curatorSelectCurator .= qq(<option value="$curator_two">$curators{$curator_two}</option>\n); } }
      $curatorSelectCurator .= qq(</select>);

      $curatorSelectCurator .= qq(<input type="hidden" name="joinkey_$trCounter"  value="$joinkey" >);  # these are required, arbitrarily added here
      $curatorSelectCurator .= qq(<input type="hidden" name="datatype_$trCounter" value="$datatype">);  # these are required, arbitrarily added here
      push @dataRow, $curatorSelectCurator;

      my $thisDonPosNeg = ''; if ( $curData{$datatype}{$joinkey}{donposneg} ) { $thisDonPosNeg = $curData{$datatype}{$joinkey}{donposneg}; }
      my $curatorSelectDonposneg = qq(<select name="select_curator_donposneg_$trCounter">);
      foreach my $donposneg (keys %donPosNegOptions) {        # display curators in alphabetical (tied hash) order, if IP matches existing ip record, select it
        if ($thisDonPosNeg eq $donposneg) { $curatorSelectDonposneg .= qq(<option value="$donposneg" selected="selected">$donPosNegOptions{$donposneg}</option>\n); }
          else {                            $curatorSelectDonposneg .= qq(<option value="$donposneg"                    >$donPosNegOptions{$donposneg}</option>\n); } }
      $curatorS