Home Reports Start

Acquisition Activity, Statistical Quality Control, and Spatial Quality Control for 1997 Annual Water Level Data Acquired by the Kansas Geological Survey

Prev Page--Statistical Quality Control Measurements || Next Page--Conclusions


III. Analysis of Spatial Quality Control

Ricardo A. Olea, Mathematical Geology Section

Introduction

The previous statistical analysis of quality control was done without directly taking into consideration the mutual distances among points defined by the penetration of the water table by a well.

Spatial location of water table at any given well is given by two Cartesian coordinates and the elevation of the water table above sea level, neither of which has been traditionally measured directly. Until last year, Cartesian coordinates have been derived from latitude and longitude, which in turn are a numerical transformation of the legal description of the location of the well. The Kansas Geological Survey (KGS) has just started to employ satellite Global Positioning System (GPS) to determine latitude and longitude in those wells previously measured by the United States Geological Survey.

A second source of error derives from the fact that what is actually measured is the depth to the water table. The elevation is calculated by subtracting depth to water from the surface elevation, which has never been measured directly. In a state with at most rolling hills, the conventional wisdom has been that one can obtain the elevation of any well with reasonable accuracy by locating the well on the appropriate 7 1/2 minute topographic map produced by the United States Geological Survey and reading the elevation by interpolation from the contour lines. The scale of these maps is 1:24,000 and the contouring interval is generally 10 ft.

Common sense and the physics of fluid flow indicate that water table elevation of a given aquifer should vary continuously in a way that wells nearby should have similar water table elevations, similarity that should deteriorate with distance in a systematic fashion. Under that assumption, the applications of the probabilistic models of geostatistics are helpful to verify if the data comply with the assumptions. The analysis will be limited to wells supposedly tapping the High Plains aquifer, the only aquifer in the state with a sampling density regular and high enough to apply geostatistics.

Methodology

Geostatistics has several estimation methods--generically known by the name of kriging--that are able to estimate the value of an attribute at a location x sub 0 without a measurement, such as a water table elevation at a site where there is no well. Kriging produces an estimate est z (x sub 0) as a linear combination of k observations est z (x sub i) around x sub 0:

equation for z estimate

where the w sub i are weights that come from the solution of a system of equations (Journel and Huijbregts, 1978). The following figure illustrates the case of a sample of size 6 in which est z (x sub 0) = 3149.6.

Figure 1--Kriging estimation for a sample involving six wells. A solid dot denotes a well location and the labels are, from top to bottom, water table elevation, in feet, and the kriging weight. The question mark shows the estimation location.

sample map showing estimation

Besides optimal properties of kriging, the method has the additional peculiarity to produce an estimation variance variance, which one can use to assess the estimation reliability. The smaller the kriging variance, the more reliable the estimate.

Crossvalidation is an ingenious application of kriging for the verification of parameter selections necessary for the solution of the system of equations. Here, instead, crossvalidation is employed to check the spatial consistency of the data. Given a sample of size n, each observation is dropped in turn, and for each discarded observation, an estimate is computed at the location of the discarded observation by using at most the remaining (n - 1) measurements. By pretending that an observation was never taken, one can genuinely produce an estimate at the location, and by bringing it back, a kriging estimation error can be computed by comparison with the true value.

Kriging is the best linear unbiased estimator. Even so, like any of the rest of the spatial estimation methods, it is not perfect, in the sense that crossvalidation errors are not zero. However, under normal circumstances, the discrepancies stay below two kriging standard deviations. Otherwise, the measurement could be either an anomaly or an observation in error. Close scrutiny is the only way to differentiate between the two, but considering suspect observations are commonly a small fraction of the sampling, when examining the accuracy of a sampling it is always less time consuming to restrict the attention to observations flagged by crossvalidation than to re-examine the entire data set.

Given that a high estimation error could be produced by poor sampling control, it is more convenient to analyze z scores zs(x), which is the kriging estimation error over its standard deviation standard dev

equation 2

Crossvalidation of a first version of the 1997 measurements

The Kansas Geological Survey measured 437 wells last January that were coded as being screened within the High Planes aquifer by the KGS Geohydrology Section. The KGS Exploration Services Section carefully inspected all measurements before sending them for spatial analysis.

Because the minimum resolution of the legal description of well locations has a precision of 10 acres, wells closer than 1/8 of a mile sometimes have the same legal description, which results in the same latitude and longitude and Cartesian coordinates, despite the fact the wells are close but not really at the same spot. Kriging cannot handle multiple observations per site for the same attribute. One well was discarded from each of three pairs of wells with the same coordinates. In all cases, the water table elevations per pair were very similar, so the discarding took nothing away from the generality of the analysis.

Calculation of kriging weights and kriging variance depends on several parameters. But none is as crucial as the semivariogram, which is a function related to the degree of spatial continuity of the model. The prevailing practice is to estimate the semivariogram at discrete distances and then fit admissible models, such as the Gaussian model shown in the figure. Axi "admissible" model is one that assures a unique solution to the kriging system of equations and a positive kriging variance.

Figure 2--Semivariogram model for the original sampling in the N2W average direction of the trend strike. The model is Gaussian, with a nugget of 426.3 sq ft, an effective range of 76594.3 meters and (sill-nugget) equal to 13355.5 sq ft.

variogram

Multidirectional modeling of the semivariogram reflects a well-known systematic decrease in water table toward the east. This peculiarity of the water table forces us to use universal kriging and to estimate and model the semivariogram along the strike of the trend, the trend-free direction.

The following table and map summarize the results for the crossvalidation of the 1997 original sampling. Both the USGS and the KGS well names are too long for posting, so the map and table use an arbitrary sequential number for labeling.

Original sampling z-scores larger than two and ordered by increasing z-score
No.USGSIDKGS IDEasting
(m)
Northing
(m)
WTE
(ft)
Z-score
32037322309947210129S 23W 12BAC 01430253.64154832.82263.56-4.48
28437425510125150127S 38W 12ADC 01286599.64176727.22868.14-2.82
30837360710056530128S 33W 20ACD 01327859.24162901.22595.37-2.27
30937363210100430128S 34W 14CCC 01322281.14163957.82630.22-2.25
12738062210101400122S 34W 26CCC 01322072.74219233.02768.26-2.24
37337201610120120131S 37W 22BCC 01292780.84134592.22837.50-2.15
10138215909854500119S 14W 30CDD 01507439.34245866.51840.35-2.10
37037212810106500131S 35W 15BAA 01312836.04136575.22703.44-2.07
40637093110128020133S 38W 2ODDB 01280806.44115182.53029.64-2.04
32837294210019220129S 27W 3OBCC 01383145.64150575.82500.61-2.03
20637522610056460125S 33W 16DCC 01328753.64193353.02828.852.28
37137222710112150131S 36W 02CDD 01304726.14138357.02827.182.55
37737142510027200232S 29W 27AAB 02370766.74122452.82529.222.63
24837463810149500126S 41W 2OBCD 01250753.24184441.83270.622.75
20237543610056130125S 33W 03BCC 01329640.94197381.52847.493.04
22037514910134160125S 39W 23BDD 01273914.14193726.03197.253.16
24737491710124270126S 37W 06ACB 01288193.94188306.03066.183.55
39137165410124400132S 38W I 1ADA 01300752.04115375.53014.834.79
17837573210036300224S 3OW 15CCC 02379636.24159130.82693.815.59
37937173310040200132S 3 IW 03DAA 01351536.74128524.52729.137.08

Despite the careful handling of the data, in absolute value the highest four z-scores are due to errors in data preparation:

  1. The surface elevation of well 320 was wrong. The true surface elevation is 2545 ft instead of the 2447 ft taken from a USGS publication (Putnam et al. 1996, p. 424).
  2. The Cartesian coordinates of well 391 were (2861122.6, 4128743) instead of (300752,4115375.5). Cartesian coordinates based on GPS measurements were the preferred values for easting and northing. Exploration Services did not report the GPS location for a few wells, including well 391. In this event, easting and northing came from decoding the USGS well identification, which is its location in degrees, minutes and seconds. Unfortunately the reported value became (37.1654, 101.2440) because the latitude and longitude of the identification were incorrectly coded by Exploration Services as degrees and fraction of degree. This understandable human mistake occurred in a few other wells, but it was detected because of redundancy with the GPS coordinates.
  3. The Cartesian coordinates of well 178 were (358706.1, 4202267.3) instead of (379636.2, 4159130.8). The source of error, as with well 391, resulted from a combination of missing GPS coordinates and wrong decoding the USGS well identification.
  4. The surface elevation of well 379 was wrong. The true surface elevation is 2787 ft instead of the 2951 ft used in the calculations of the water table elevation. The incorrect surface elevation was reported by the USGS.

Finally a well-by-well check of the 7 1/2 minute topographic maps of surface elevation for all wells with a z-score above 2.0 disclosed that well 101 was also in error. The surface elevation should be 1905 ft instead of 1875 ft.

Figure 3--Coded posting of z-scores for the original sampling. A cross denotes a well with a z-score below 2; a red dot, between 2 and 3; and a red triangle, larger than 3. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non-contributing portion. A larger version of this figure is available.

coded posting

The influence that these errors had on the contouring of water table elevation is hardly noticeable on low-resolution maps, but some show as bull-eyes on detail maps. All maps were prepared using the same universal kriging method employed in the crossvalidation.

Figure 4--Universal kriging estimation of water table elevation employing the original sampling. A cross denotes a well with a z-score below 2; a red dot, between 2 and 3, and a red triangle larger than 3. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non-contributing portion. A larger version of this figure is available.

kriging estimation

Figure 5--Southwestern Kansas enlargement of a universal kriging estimation of water table elevation employing the original sampling. A cross denotes a well with a z-score below 2; a red dot, between 2 and 3; and a red triangle, larger than 3. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non-contributing portion. A larger version of this figure is available.

kriging estimation enlargement

Edited version of the 1997 measurements

The semivariogram is half the mean-square difference in the attribute for locations separated by the same distance along the same direction. Modeling of the semivariogram is not totally insensitive to the errors detected by crossvalidation. Away from the wells in error, changes are minor in terms of the effects on the estimate est z (x sub 0), but the kriging variance has an important global reduction in accordance with the better continuity implied but the disappearance of the sudden fluctuations associated to the wells in error. Such reduction in standard dev makes the z-scores in (2) to increase even if the kriging error kriging error remains the same.

Figure 6--Semivariogram model for the edited sampling in the NS average direction of the trend strike. The model is Gaussian, with a nugget of 240.6 sq ft, an effective range of 67868.4 meters and (sill-nugget) equal to 11102 sq ft.

semivariogram of edited sampling

Edited sampling z-scores larger than three and ordered by increasing z-score
No.USGSIDKGS IDEasting
(m)
Northing
(m)
WTE
(ft)
Z-score
28437425510125150127S 38W 12ADC 01286599.6 4176727.22868.14-3.69
39037142010118550132S 37W 26BAC 01294566.0 4123666.52985.443.07
39137165410124400132S 38W I 1ADA 01286112.6 4128743.03014.833.19
3713722271011215013 IS 36W 02CDD 01304726.1 4138357.02827.183.48
22037514910134160125S 39W 23BDD 01273914.1 4193726.03197.253.53
24837463810149500126S 41W 2OBCD 01250753.2 4184441.83270.623.54
20637522610056460125S 33W 16DCC 01328753.6 4193353.02828.853.80
24737491710124270126S 37W 06ACB 01288193.9 4188306.03066.184.54
20237543610056130125S 33W 03BCC 01329640.9 4197381.52847.495.11

In this case one can see that correction of only five errors has an important influence on the estimation of the water table elevation. Coding of z-scores has been changed to compensate for the effect produced by the change in semivariogram model. At this stage it is safe to say that all wells with high z- scores have correct Cartesian coordinates, surface elevations and depth to the water table. Further scrutiny of a more geohydrologic nature is recommended to discard other more natural but equally disturbing causes, such as tapping aquifers other than the High Plains aquifer.

Figure 7--Universal kriging estimation of water table elevation employing the edited sampling. A cross denotes a well with a z-score below 2; a red dot, between 2 and 3, and a red triangle larger than 3. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non-contributing portion. A larger version of this figure is available.

kriging estimation edited sampling

Figure 8--Southwestern Kansas enlargement of a universal kriging estimation of water table elevation employing the edited sampling. A cross denotes a well with a z-score below 3; a red dot, between 3 and 4; and a red triangle, larger than 4. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non-contributing portion. A larger version of this figure is available.

kriging estimation edited sampling enlargement

A map of kriging standard deviation provides a continuous rendition on the reliability of an estimated surface. Notice that areas with high kriging standard deviation are associated with areas of poor control, which one could remedy with the addition of only 10 to 15 measurements. Such expansion on the network requires further evaluation, but should be seriously considered for those spots with high kriging standard deviation away from the aquifer boundaries.

Figure 9--Universal kriging standard deviation for the edited sampling. A cross denotes a well with a z-score below 3; a red dot, between 3 and 4; and a red triangle, larger than 4. The solid blue line is the actual boundary of the High Plains aquifer and the segmented line the boundary of its non- contributing portion. A larger version of this figure is available.

kriging standard deviation

Conclusions

Perfect editing of large data bases is tedious, time consuming, and laborious, thus expensive.

In the specific case of the High Plains Observation Network, five important errors escaped detection by the numerous and serious editing efforts of the United States Geological Survey, the KGS Geohydrology Section, and more recently by the KGS Exploration Services Section.

Crossvalidation is an inexpensive and fast spatial quality control tool to run. It works by restricting attention to measurements with high z-scores. Although true geological anomalies also lead to high z-scores, sometime high z-scores are produced by inconsistencies in the sampling. Given that the proportion of observations with z-scores over 2.0 are small, a ranking of the observations by z-scores is cost effective because of the significant reduction in the number of measurements, calculations, and transcriptions that must be reviewed.

The Kansas Geological Survey should further investigate wells still posting high z-scores to discard anomalous sources of variations such as misclassification of wells into the High Plains aquifer category.

As a by-product, the kriging standard deviation map suggests that a few more wells in areas with high kriging standard deviation will add significant reliability to the estimation of the water table elevation.

References

Journel, A. G. and C. J. Huijbregts, 1978, Mining Geostatistics: Academic Press, London, England, 600 p.

Putnam, J. E., D. L. Lacock, D. R. Schneider, M. D. Carlson, and B. J. Dague, 1996, Water Resources Data-Kansas, Water Year 1995, 488 p.

Prev Page--Statistical Quality Control Measurements || Next Page--Conclusions


Kansas Geological Survey, Water Level CD-ROM
Send comments and/or suggestions to webadmin@kgs.ku.edu
Updated Feb. 25, 1997
Available online at URL = http://www.kgs.ku.edu/Magellan/WaterLevels/CD/Reports/OFR9733/rep04.htm