DataPro setup and workflow

From IARC 207 Wiki
Revision as of 18:31, 3 November 2015 by 137.229.29.173 (talk)
Jump to navigation Jump to search

Initial setup

It's good to settle on a standard directory structure if you can. For linux, most things go in a bin directory. You can do the same with windows. Recommended though to not put any spaces characters ' ' in folder names etc as I don't know if there are ill consequences. Be conservative, descriptive names, well organized, use underscore character in place of spaces. Logical places you might put them: /var/data/bin/ /home/site/bin/ c:\data\bin/


Then, suggested directory structure for the site's themselves: /var/data/$site_name - root directory of the site /var/data/$research_area/$site_name - another logical location for the root directory of the site. /var/data/$site_name/config/ - location of the configuration files used by datapro and related utilities for processing the data /var/data/$site_name/raw/ - location for the raw data (I personally think a good practice is to have the original data from LoggerNet / loggerdata etc /var/data/$site_name/outputs/ - location for the automatically processed data: raw data that has been analyzed by datapro /var/data/$site_name/web/ - location for a shorter time series, like the last 3 weeks suitable for a web page. /var/data/$site_name/qc/ - location for the automated qa/qc log /var/data/$site_name/error/ - location for datapro error logs

To run: python /var/data/bin/csv_utilities/datapro.py --key_file=/var/data/$site_name/config/keyfile.txt

Then, watch for error messages. Depending on your system there may be some python libraries that need to be installed. If you use windows, enthought python distribution is a nice all-encompassing choice or cygwin for making sure you have all of the right things on the computer.

If you have any questions just ask Bob Busey.

Initial site set up workflow

  1. create the initial directory structure listed above.
  2. I usually don't use the site creator though I certainly could. More often I start with an existing site's configuration files... copy them to a new config and edit to reflect the new site.
  3. the file that controls qa/qc parameters, filenames, any functions to be applied etc is the parameter csv file. It's often easiest to just edit this on a computer with a graphical spreadsheet program like Excel or Libre Office to get things proper. The header information can often be copied & pasted straight from the raw .dat file
  4. next, edit the .txt key file to reflect where file and directory locations are all placed.
  5. run datapro as shown above.
  6. if it works, awesome. if it doesn't, make note of the errors and try to address them.
  7. once it's running I do a couple more things. ln -s is used to create symlinks for a few other items. I maintain an automatically generated diagnostics web page at http://ngeedata.iarc.uaf.edu/data/data.html. It expects battery voltage to be placed in 'battery.csv' and Internal Panel Temperature to be in 'paneltemp.csv' if you don't use those names you can use ln -s to create a symlink. I then will also make sym links so that the processed data is available over the internet without making a second full copy of the data.