This assignment called to create a new R package with at least one unique function.
My R package "nerrsclean" provides 1 function, clean(), to prepare data collected under NOAA's National Estuarine Research Reserve System's (NERRS) National Monitoring Program for analysis. The function accepts two arguments:
- df: dataframe from CSV file
- normalize: user option to add normalized columns or not (TRUE / FALSE)
The function returns a cleaned dataframe with or without additional columns of normalized values. The data sets appropriate for this tool can be found here: http://cdmo.baruch.sc.edu/dges/
The purpose of this package is to serve as a preliminary data cleaning tool for any of the .CSV files downloadable from the above site. I found value in creating a package for this specific task because if I were in the shoes of someone that had to regularly handle this type of environmental data, it would be helpful to have a simple tool to handle the often most time consuming part of analysis, the data cleaning.
When setting up the package, I had some confusion at first that was mostly due to me not reading all the documentation for "roxygen2". For example, as I was tweaking the DESCRIPTION file and other details I noticed that roxygen would only update the .Rd and NAMESPACE files if they were not already present.
Knowing what I know now, I would make these 2 changes before creating my next R package:
- Have the name and objective of the package before creating anything in RStudio
- Using a smaller dataset when testing the functions. My test file was over 500,000 records and took a few minutes to load with each pass.
It was exciting to create a new package and know that I have the ability to share content with the R community. I had a great feeling after opening a new project and seamlessly installing my package via GitHub into RStudio.
Below is the link to the package on GitHub and the DESCRIPTION file:
Package: nerrsclean Type: Package Title: National Estuarine Research Reserve System Data Cleaning Version: 0.1.0 Author: "Kevin Hitt
[aut, cre]" Description: Package for easily cleaning data collected under NOAA's National Estuarine Research Reserve System's (NERRS) National Monitoring Program. Data from http://cdmo.baruch.sc.edu/dges/ Depends: R (>= 3.1.2), scales License: CC0 Encoding: UTF-8 LazyData: true RoxygenNote: 126.96.36.19900