Acquisition Activity, Statistical Quality Control, and Spatial Quality Control for 1997 Annual Water Level Data Acquired by the Kansas Geological Survey
Prev Page--Acquisition Activity || Next Page--Analysis of Spatial Quality Control
The primary variable measured in the water well observation program is depth to water in an observation well. This primary variable is associated with three secondary variables; the ground elevation, east-west coordinate, and north-south coordinate of the well. The secondary variables serve to locate the primary variable in space, and make it possible to determine spatial relationships between observation wells, including mapping the water table and calculating changes in aquifer volume. Historically, the three location variables were determined initially by the U.S. Geological Survey for each well and not re-determined unless a serious error in the original coordinates was suspected. In the 1997 ground water observation measurement program conducted by the Kansas Geological Survey, the geographic (latitude and longitude) coordinates of all wells were re-determined by GPS techniques. In addition, several characteristics of the observation wells and of the measurement procedure were noted in order to determine if they might influence the quality of the measurements being made (in statistical parlance, these extra measurements are called exogenous variables). As part of the quality assurance program, water level measurements were repeated two or more times on 48 wells, yielding a collection of 141 quality control observations. Because these data include replicates, they can be used to estimate the reproducibility of water level measurements and to determine the extent that well conditions or measurement procedures may influence the results.
The primary variable, depth to water, varies with geographic location and differences in topography so much that these factors will overwhelm all other sources of variation. This means that any errors in location may have a profound effect on the water table elevation. To avoid the complications of simultaneously considering uncertainties in the secondary variables, this statistical quality control study is based on first differences (specifically, the difference between 1997 and 1996 depth-to-water measurements). The secondary variables cancel out, leaving only the difference in depth, which is numerically identical to the year's change in water level. In this statistical quality control study, the difference between 1997 and 1996 corrected depth measurements is abbreviated "'97-'96." If the water table is lower this year, the variable "'97-'96" will be a positive number.
The objective in our quality control study is to identify and assess possible sources of unwanted variation in water level measurements made by the KGS. The purpose of the analysis is to provide guidance to the KGS field measurement program, to suggest ways in which field measurements might be improved, and to provide information necessary to identify past or current measurements that are suspect. The statistical quality control and field measurement programs have been intimately intertwined from the outset when the KGS assumed responsibility for measuring observation wells formerly measured by the USGS. The results of this quality control study have implications beyond the 1997 sampling program, but these concerns are outside the scope of this report.
Statistical Procedures
An analysis of variance (ANOVA) procedure was used to estimate the significance of different well and procedural characteristics on "'97-'96" in the set of 141 quality control observations. The quality control data are a much smaller set than the complete set of 1997 well measurements, but they are appropriate for statistical quality control because replication permits testing for lack of fit. A subsequent analysis of the complete set of 1997 observation well measurements shows that they are much less responsive to differences in the exogenous variables, primarily because of the lack of replication and the larger number of states in some variables.
Details of field procedures previously discussed include an overview of data recording and processing techniques. The measurement procedure includes a correction for the distance from the measurement point on the well to the ground's surface, so "Depth to water" is a true depth. The following variables were recorded for each well.
In addition, each well has a unique USGS ID number and KGS ID designation, a surface elevation, a legal description of the well location, a decimal latitude and longitude (obtained by LEO conversion of the legal description), and the purpose for which the well is used. The records for some wells include a USGS "principal aquifer" designation and a KGS "geologic unit" code. Unfortunately, about 150 of the wells originally had no aquifer designation and these had to be provisionally assigned by Don Whittemore. This particular variable, here called "Aquifer Code," is critical for statistical analyses, because differences between aquifers are believed to be a major source of variation in water level. The additional variables taken from the historical record are
The initial statistical model includes all exogenous variables recorded during the quality control study that may contribute to the variability in the response, '97-'96, plus the variables "Well Use" and "Aquifer Code." This initial model shows that the quality of downhole access is not a significant contributor to total variance, nor is the presence of oil on the water. "Measurer" and "Chalk Cut Quality" are marginal contributors to total variance.
Analysis of Variance Table for initial model | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | F Ratio | Prob>F |
Model | 21 | 585.5386 | 27.8828 | 5.8494 | <.0001 |
Measurer | 5 | 37.87167 | 7.5743 | 1.5890 | 0.1687 |
Well Access | 1 | 45.26345 | 45.2635 | 9.4956 | 0.0026 |
Downhole Access | 1 | 0.00089 | 0.0009 | 0.0002 | 0.9891ns |
Weighted Tape | 1 | 66.29430 | 66.2943 | 13.9075 | 0.0003 |
Well Use | 3 | 49.12570 | 16.3752 | 3.4353 | 0.0193 |
Oil on Water | 1 | 2.42226 | 2.4223 | 0.5082 | 0.4774ns |
Chalk Cut Quality | 2 | 17.22951 | 8.6148 | 1.8072 | 0.1687 |
Aquifer Code | 7 | 138.14482 | 19.7350 | 4.1401 | 0.0004 |
Error | 115 | 548.1825 | 4.7668 | ||
Total | 136 | 1133.7211 | |||
RSquare 0.516475 |
A revised model was run that does not include the two non-significant exogenous variables, and to determine the effects of alternative classifications of aquifers. The three classifications tested are those suggested by Whittemore (memo, Feb. 11, 1997); they are a 5-part classification that distinguishes between alluvial aquifers, alluvial aquifers plus other unconsolidated aquifers, the High Plains aquifer, bedrock aquifers, and bedrock plus unconsolidated aquifers; a 3-part classification that emphasizes the importance of the regional, bedrock, and alluvial aquifers; and a 3-part classification that emphasizes alluvium in multiple aquifers and unconsolidated aquifers combined with bedrock aquifers.
Analysis of Variance Table for aquifer groups | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | F Ratio | Prob>F |
Model | 19 | 583.0588 | 30.6873 | 6.5202 | <.0001 |
Measurer | 5 | 36.10056 | 7.22011 | 1.5341 | 0.1845ns |
Well Access | 1 | 57.12575 | 57.12575 | 12.1376 | 0.0007 |
Weighted Tape | 1 | 105.46585 | 105.46585 | 22.4085 | <.0001 |
Well Use | 3 | 51.83722 | 17.27907 | 3.6713 | 0.0143 |
Chalk Cut Quality | 2 | 38.11030 | 19.05515 | 4.0487 | 0.0199 |
Aquifer Code | 7 | 137.81088 | 19.68727 | 4.1830 | 0.0004 |
Grouping 1 | 4 | 67.413120 | 16.85328 | 3.2564 | 0.0142 |
Difference | 3 | 70.39776 | 23.46592 | 4.5260 | 0.0095 |
Grouping 2 | 2 | 50.906600 | 25.45330 | 4.8706 | 0.0092 |
Difference | 5 | 86.90428 | 17.38086 | 3.3524 | 0.0104 |
Grouping 3 | 2 | 55.942921 | 27.97146 | 5.3950 | 0.0057 |
Difference | 5 | 81.867959 | 16.37359 | 3.1581 | 0.0123 |
Error | 122 | 632.5302 | 5.18467 | ||
Total | 136 | 1133.7211 | |||
RSquare 0.514288 |
All components in the model are highly significant except "Measurer." The most significant source of variance is the use or non-use of a weighted tape. The second most significant source of variance is the aquifer code. Grouping aquifer units into smaller numbers of categories (Groupings 1, 2, or 3) results in a statistically significant loss of variance. The significance of this loss is greatest for Grouping 1 and least for Grouping 3, but the loss of significant amounts of variance for any grouping suggests that the aquifers should not be grouped for the purpose of statistical analysis.
The final Analysis of Variance includes the 5 exogenous variables identified as contributing significantly to the total variance in the QC data set. The aquifer measured, and the presence or absence of a weight on the tape, are the most important sources of variance in the data. The next source is the quality of access to the well, followed by the well use and the quality of the chalk cut.
Analysis of Variance Table for final model | |||||
---|---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | F Ratio | Prob>F |
Model | 19 | 583.0588 | 30.6873 | 6.5202 | <.0001 |
Well Access | 1 | 57.02018 | 57.0202 | 11.8557 | 0.0008 |
Weighted Tape | 1 | 124.12291 | 124.1229 | 25.8077 | <.0001 |
Well Use | 3 | 42.86839 | 14.2895 | 2.9711 | 0.0345 |
Chalk Cut Quality | 2 | 32.54747 | 16.2737 | 3.3836 | 0.0371 |
Aquifer Code | 7 | 160.97546 | 22.9965 | 4.7814 | <.0001 |
Error | 122 | 586.7629 | 4.8095 | ||
Total | 136 | 1133.7211 | |||
RSquare 0.482445 |
Importance of contributing variables
The relative contributions of each category of the contributing variables can be determined by examining the least-squares means (averages) of '97-'96 for a specified state of a variable, while holding all other variables at their average value. (In statistical parlance, these averages are referred to as the expected values of the variables.)
Well Access | |
---|---|
Level | Least Sq Mean |
0 | 1.9654 |
1 | -0.6525 |
Weighted Tape | |
Level | Least Sq Mean |
0 | -1.0408 |
1 | 2.3537 |
Well Use | |
Level | Least Sq Mean |
H | 1.1044 |
I | 1.1828 |
S | 0.9703 |
U | -0.6317 |
Chalk Cut Quality | |
Level | Least Sq Mean |
0 | 2.0023 |
1 | 0.0914 |
2 | -0.1244 |
Aquifer Code | |
Level | Least Sq Mean |
KD | 4.5243 |
KJ | 3.1498 |
QA | -4.7334 |
QAQU | -0.8720 |
QU | -0.8855 |
QUTO | 1.7959 |
QUTOKJ | 1.9163 |
TO | 0.3562 |
Summary of the Analyses of Variance
A well with good access tends to have a water level that is slightly shallower than in 1996; the water level in a well where access is poor tends to be over 2 feet deeper than recorded in 1996. A difference in quality of access is related to a difference in water level of more than 2.6 feet.
The water level measured without a weight on the tape tends to be almost 2 feet deeper than recorded in 1996. Water levels measured in wells where the tape is weighted tend to be over a half-foot shallower than in 1996. The difference between using or not using a weight on the tape is related to a difference in water level of more than 2.6 feet.
Water levels in wells used for household water supply, irrigation, and stock watering tend to be about a 1 foot deeper than in 1996. Unused wells tend to be about 0.6 foot shallower.
If the quality of the chalk cut on the measuring tape is "poor," the reading tends to be about 2 feet deeper than in 1996; if the cut is "good" or "excellent," the reading tends to be nearly the same as the preceding year.
The most dramatic differences in water level change are related to the aquifer being measured. Water levels in the Dakota aquifer ("KD") tend to be over 4.5 feet deeper than in 1996. Undifferentiated Lower Cretaceous/Upper Jurassic aquifers tend to have water tables over 3 feet deeper than in 1996. Quaternary alluvium aquifers ("QA") tend to be over 4.7 feet shallower than in 1996. Undifferentiated Quarternary aquifers ("QU") and ("QAQU") tend to be about a foot shallower than in 1996. Ogallala aquifer wells tend to be slightly deeper than in 1996, while the water tables in wells tapping Ogallala aquifers in combination with other aquifers ("QUTO" and "QUTOKJ") tend to be almost a foot deeper.
Changes from preliminary analysis
In the preliminary analysis of quality control data (Davis memo, Feb. 13, 1997), significant "Measurer" effects were noted, as were significant "Downhole" effects. "Well Access" was not significant. With the additional QC measurements made in March, these components of the model changed. One measurer, DRL, made additional re-measurements from all parts of the data collection area, in effect randomizing at least one component of "Measurer" over all other variables. This randomization is sufficient to cancel out possible operator bias in the QC measurement data. Among the re-measured wells, there are significant correlations between poor access, poor downhole quality, and the use of an unweighted tape. In a sense, these variables are redundant, and one of them can proxy for another. With the additional wells re-measured in March, "Well Access" has become a proxy for variation previously accounted for by "Downhole" effects.
Variation due to secondary (locational) variables An analysis of locational accuracy is not necessary for a Quality Control study, because locational and ground elevation errors are canceled out by differencing. However, it became apparent during the water level measurement program that discrepancies between GPS measurements and well locations recorded in the historical data could signal important problems. Incorrect locations in the historical record of observation wells can result in mistakenly measuring the wrong well, or at the least, wasting valuable field time searching for the correct well. Errors in GPS measurements could cause the same problems in the future when field crews will be guided to the wells by navigational aides based on GPS.
Consequently, USGS decimal latitude and longitude locations, KGS legal description locations, and GPS decimal latitude and longitude measurements were converted by David Collins to metric UTM zone 14 X- and Y-coordinates using LEO II software. There were a few large discrepancies between USGS latitudes and longitudes and KGS legal locations, but most differences were less than 8 meters. This is much less than the inherent uncertainty of legal coordinate notation, suggesting that the USGS latitudes and longitudes were derived mathematically from the legal descriptions, a supposition confirmed by Don Whittemore (memo, April 7, 1997). Fewer than 30 wells showed serious discrepancies between the two historical measures of location, and these were resolved as transcription blunders or conversion errors. However, there are much greater differences between new GPS measurements of well locations and the recorded locations of observation wells.
Based on the manufacturer's stated GPS positional accuracy for the Garman 45XL global positioning device when used in Kansas, GPS coordinates and legal descriptions of a point (or derived USGS latitudes and longitudes) should fall within 125 meters of each other. Approximately 40% of the measurements exceed this limit; two wells had location discrepancies of over two kilometers! These were resolved as blunders in their legal descriptions, but approximately 165 wells remain whose locations differ by more than is expected.
Differences, in meters, between UTM zone 14 coordinates derived from legal descriptions of wells and GPS measurements of well locations. Shaded circle indicates maximum error expected from legal description and GPS coordinate uncertainties.
Wells showing the most extreme of these differences were investigated during a second cycle of field checking, when some discrepancies were resolved as the result of historical errors in location and others proved to be recording errors by the field crews (only one GPS device was equipped with automatic data recording-others required manual transcription of the GPS latitude and longitude). Some of the smaller differences may be attributable to an operational error that occurred during the initial days of field measurement-some GPS devices were set to record using WGS '94, the current internationally accepted geoid, while others were set to use NAD '27, the geoid historically used by the USGS for topographic maps. When this problem was discovered, all GPS units were reset to NAD '27, but no notation was made of which observations used one setting or the other. The effect of using the 1994 datum is to translate GPS well locations 120 meters to the west and 2 meters to the south relative to equivalent positions on maps made with the 1927 datum. This offset is apparent on a plot of the differences in X and Y coordinates between legal locations and GPS locations, where measurements made with the wrong setting probably are responsible for the unusual density of points on the right-hand side of the diagram. This confusion cannot be untangled after the fact, but fortunately it will not reoccur in the future because all GPS devices will be preprogrammed and will automatically record their data.
Conclusions
The Quality Control program identifies well and procedural conditions that may contribute significantly to the values of "Depth to Water" measured in observation wells. Gathering Quality Control information requires little additional effort by the field crews, emphasizes the importance of procedural consistency, and certifies performance. The QC program should be continued without change even though this year most exogenous variables were not significant.
As expected, "Aquifer Code" is the most important exogenous source of variance in measurements. The assignments of codes should be reviewed for all wells in the KGS measurement program, especially the 150 wells whose codes were provisionally assigned this year. The aquifer code assignments of all wells whose '97 - '96 value deviates significantly from the norm for the aquifer should be confirmed; the box-and-whisker plot below shows that some wells exhibit unexpected behavior and may be misclassified.
There are persistent differences between wells measured using a weighted tape and those measured without a weight on the tape. To avoid this source of extraneous variance, well access ports should be modified if possible so weights can be used on all wells, or alternative observation wells should be sought so that weighted tapes can be used consistently.
Well Access is a significant source of variation but is believed to be a proxy for downhole quality, as the two variables are highly correlated in the QC data set. It would be desirable to either improve or replace any well in the data set which has poor access, poor downhole conditions, or both.
The confusion in locational measurements caused by use of two different geoids illustrates the value of careful pre-field work preparation and the enforcement of procedural quality controls. The recording blunders that were detected in some GPS measurements emphasizes the value of automating the recording and processing of data, and the use of identical, pre-set equipment by all field crews. Presumably these sources of error will disappear in future field seasons, when all measuring parties are equipped with pre-programmed portable computers that have attached GPS devices.
The preliminary Quality Control analysis detected significant operator effects, or variance attributable to the individuals making the measurements. This variance became non-significant when a later round of re-measurements were added to the data set. We can interpret this as reflecting the value of experience in taking observation well measurements, as techniques became more consistent with time. Presumably, the desirable experience can be obtained by appropriate training prior to the next field season.
The Quality Control program has achieved its objectives of identifying and quantifying sources of unwanted variation in observation well data collection, and in flagging wells whose measurements required verification. It detected numerous spurious values, both in the measured data and in the historical data, resulting in a much "cleaner" data base than otherwise would have been the case. If the Quality Control process is routinely applied to KGS observation well measurements in the future, and particularly if it is applied to the entire Kansas observation well network, the quality of the data will be progressively improved with time.
Prev Page--Acquisition Activity || Next Page--Analysis of Spatial Quality Control