Difference between revisions of "Processing Data Workflow"

From IARC 207 Wiki
Jump to navigation Jump to search
(Created page with "# Visit Site & Download Data with Toughbook ## data should go in ~/Dropbox/logger_data/$YEAR/$SITE/ ## general naming convention for data files is: $SITE_$TABLE-$YEAR_$MONTH_D...")
 
 
Line 1: Line 1:
 
# Visit Site & Download Data with Toughbook
 
# Visit Site & Download Data with Toughbook
 
## data should go in ~/Dropbox/logger_data/$YEAR/$SITE/
 
## data should go in ~/Dropbox/logger_data/$YEAR/$SITE/
## general naming convention for data files is: $SITE_$TABLE-$YEAR_$MONTH_DAY.dat
+
## general naming convention for data files is: $SITE_$TABLE-$YEAR_$MONTH_$DAY.dat
# Back in town, put toughbook online so it can be mirrored by other computers
+
# Back in camp/town/place with internet access, put toughbook online so Dropbox can mirror data to other computers
# Back to office, copy this data from dropbox to /var/site/$AREA/$SITE/raw or similar
+
# Next, out of the field and back in the office: copy this data from dropbox to /var/site/$AREA/$SITE/raw or similar (e.g. (/var/site/utq/utq_A/raw/)
# Most processing scripts are located in /var/site/bin/ with name like process_$SITE.sh
+
# Most processing scripts are located in /var/site/bin/ with a name like process_$SITE.sh (e.g. process_teller_bottom.sh)
 
# to this script add lines in each appropriate section so that datapro can process this new data
 
# to this script add lines in each appropriate section so that datapro can process this new data
 
# In the initial cut here (I'm just starting to do this 10/2018) I have two bash functions in the script.  They're related.  1) is delete all the processed data in /var/site/$AREA/$SITE/outputs and 2) is apply the manual QA.  In the bash script, I comment the calls to do those two things first run so that the processing is quicker.  So, edit the bash file to comment the two function calls, too.
 
# In the initial cut here (I'm just starting to do this 10/2018) I have two bash functions in the script.  They're related.  1) is delete all the processed data in /var/site/$AREA/$SITE/outputs and 2) is apply the manual QA.  In the bash script, I comment the calls to do those two things first run so that the processing is quicker.  So, edit the bash file to comment the two function calls, too.
Line 12: Line 12:
 
# there should be a line in the process_$SITE.sh subroutine for manual fixes for this file (most don't have this line until there is manually correcting to do).
 
# there should be a line in the process_$SITE.sh subroutine for manual fixes for this file (most don't have this line until there is manually correcting to do).
 
# After manual corrections are complete and script is updated, re-run /var/site/bin/process_$SITE.sh with everything uncommented so that final dataset is created from scratch and manual edits are applied.
 
# After manual corrections are complete and script is updated, re-run /var/site/bin/process_$SITE.sh with everything uncommented so that final dataset is created from scratch and manual edits are applied.
# upload the final product from local computer to main server / or data archive etc
+
# upload the final product from local computer to main server / or data archive etc (ngeedata / ocotal / project data portal)

Latest revision as of 12:06, 17 October 2018

  1. Visit Site & Download Data with Toughbook
    1. data should go in ~/Dropbox/logger_data/$YEAR/$SITE/
    2. general naming convention for data files is: $SITE_$TABLE-$YEAR_$MONTH_$DAY.dat
  2. Back in camp/town/place with internet access, put toughbook online so Dropbox can mirror data to other computers
  3. Next, out of the field and back in the office: copy this data from dropbox to /var/site/$AREA/$SITE/raw or similar (e.g. (/var/site/utq/utq_A/raw/)
  4. Most processing scripts are located in /var/site/bin/ with a name like process_$SITE.sh (e.g. process_teller_bottom.sh)
  5. to this script add lines in each appropriate section so that datapro can process this new data
  6. In the initial cut here (I'm just starting to do this 10/2018) I have two bash functions in the script. They're related. 1) is delete all the processed data in /var/site/$AREA/$SITE/outputs and 2) is apply the manual QA. In the bash script, I comment the calls to do those two things first run so that the processing is quicker. So, edit the bash file to comment the two function calls, too.
  7. next, run the bash script.
  8. once done the data will appear here: /var/site/$AREA/$SITE/outputs/. I have a web page that can be used for visualizations. /var/site/$AREA/$SITE/index.html after the automated processing has fixed everything then I review this page to catch things it misses (for instance, animal damage knocking a radiometer off level, other oddities where the data appears right but isn't quite.)
  9. Any manual corrections go in an excel spreadsheet. /var/site/$AREA/$SITE/qc/$SENSORNAME_fixes.xlsx
  10. there should be a line in the process_$SITE.sh subroutine for manual fixes for this file (most don't have this line until there is manually correcting to do).
  11. After manual corrections are complete and script is updated, re-run /var/site/bin/process_$SITE.sh with everything uncommented so that final dataset is created from scratch and manual edits are applied.
  12. upload the final product from local computer to main server / or data archive etc (ngeedata / ocotal / project data portal)