Methods of Data Compilation and Analysis

Stream Sampling
Water-quality sampling on the Rappahannock River
near Fredericksburg, Virginia

A complete description of the methods used by the USGS to analyze water-quality data collected in the Chesapeake Bay watershed is provided in Hirsch and others (2010), Moyer and others (2012), Hirsch and De Cicco (2014), Hirsch and others (2015), and Chanat and others (2015). The following summary describes how scientists construct the dataset and use the water-quality model to determine nutrient and suspended-sediment loads and trends.

Dataset Construction

Updated streamflow and water-quality data are compiled each spring to prepare for the annual computation of loads and trends. Daily mean streamflow data are retrieved for all sites to be analyzed directly from the USGS National Water Information System (NWIS). For water-quality analysis, the USGS has compiled a database of historical (1972 through 2018) observed water-quality data collected at each of the nontidal network stations. Since 2011, all observed water-quality results collected by the multiple monitoring agencies that operate the nontidal network are reported to U.S. Environmental Protection Agency (EPA), Chesapeake Bay Program and stored in the Chesapeake Environmental Data Repository (CEDR).

Acquiring and Incorporating Data From Other Agencies

Annually, the USGS receives water-quality records from EPA and the CEDR database for the previous water year, defined as the 12-month period from October 1 through September 30. These new water-quality observations are combined with the historical observations to create a complete record of water-quality observations for each nontidal network station.

The primary water-quality constituents considered are total nitrogen, dissolved inorganic nitrogen, total phosphorus, orthophosphate, and suspended sediment. All records of water-quality observations are reviewed by the collecting agency to ensure data completeness and accuracy.

Accounting for Changes in Laboratory Procedures

Various evolutions in the long-term history of the program have resulted in slight changes to laboratory analyses and methodologies that must be accounted for prior to the data analysis. These evolutions include:

Water-Quality Model Used for Load and Trend Determination

Concentration data retrieved from the nontidal database and daily streamflow data from NWIS are used for load and trend analyses. Recent advances in the statistical tools available to compute loads and trends have led to the use of revised data-analysis methodologies. For water year 2018, all load and trend estimates were made using a multiple linear regression model known as Weighted Regressions on Time, Discharge, and Season (WRTDS; Hirsch and others, 2010, Hirsch and De Cicco, 2014).

How Does the WRTDS Model Work?

The WRTDS model uses a sparse set of discrete water-quality observations combined with a continuous daily discharge record to estimate concentration on days for which no water-quality data are available. Daily concentration and load estimates are then aggregated to monthly and annual time scales. An algorithm is then applied to estimate the trend in "flow-normalized load," namely a trend that minimizes the confounding effect of any concurrent trend in discharge. Confidence in the flow-normalized trend is assigned through application of likelihood analyses using bootstrapped replicates (Hirsch and others, 2015). Detailed comparative studies by Chesapeake Bay River Input Monitoring (RIM) team staff (Moyer and others, 2012; Chanat and others, 2015) have documented that WRTDS performs better than regression-based approaches used historically.

Why Are Trends Flow-Normalized?

Observed water-quality loads are highly influenced by streamflow and season. Trends are adjusted for flow and season to minimize the influence of these potentially confounding factors. This process is referred to as "flow-normalization," and is described further in Hirsch (2010). Flow-normalized trends help scientists evaluate changes in load resulting from changing sources, delays associated with storage or transport of historical inputs, and (or) implemented management actions.

How Are Trends in Loads Identified

Identified trends are based on the results of likelihood analyses using bootstrapped replicates (Hirsch and others, 2015). As an example, for a given site and constituent, reported positive (or negative) trends having likelihood estimates of at least 0.67 mean that positive (or negative) trends were evident in about two-thirds of the bootstrapped replicates for that site and (or) constituent.

What Trends Are Computed?

For stations having water-quality records beginning prior to 1990, trends in load are computed for both the period of record and for the most recent 10 years (2009-18). For stations having records beginning after 1990, only 10-year trends (2009-18) are computed. All data available, including data collected prior to 2009, are used to estimate 10-year trends.