How Do You Determine Accuracy in Mapping?
Swimming in a sea of data, mapping professionals need to understand precision, accuracy and variance
All maps are wrong. Get used to it! Maps are an abstraction of the real world and as such only approximate reality. Each layer of orthos or LiDAR data comprises dozens, if not millions, of measurements about location. Like all measurements, uncertainty is certain. Ortho and LiDAR base maps may not overlay well. They are often created for different purposes. Or the allowable error in each map is different. Important to any geospatial professional is quantifying the error. How close are the coordinates for a feature in the mapping to its actual place on earth? Which map is most accurate?
|Maps are an abstraction of the real world and as such only approximate reailty.|
Today, mapping professionals are drowning in an ocean of accessible geodata, easy to download and integrate into a map. Accuracy is frequently not questioned until an anomaly or problem is discovered. But because geodata is so abundant, obtaining it is often effortless. Because accuracy and error is so important, a thorough understanding of these concepts is important.
Accuracy is a foundational considera-tion for any geospatial project. A professional is more confident that geospatial data will work well for its intended use if its accuracy has been well described. But this leads to another question: How confident can (or should) one be that the coordinates of a feature’s location are correct? What’s meant by “correct”? Were the coordinates “perfect” or just “close enough” to within five inches? Five feet? Five meters? Fifty meters? Was the associated error even quantified?
There is a basic universal truism: every measurement by any device contains error. Knowing this axiom, one can say with absolute confidence: all mapping is “wrong.” This “abstraction of reality” is modeled after the real world, and it is not perfect. The coordinates queried from the map will not be the true coordinates for that feature’s location on earth. It matters little if the user is an expert surveyor or mapmaker and using the best equipment. The measurement is wrong.
Knowing that the “true” value of a measured thing is never known and the quantity of error present is always unknown, it becomes imperative for a geospatial professional to know something about the accuracy and error associated with the mapping. The best one can do is measure and map using best practices and then describe the distribution of those errors. Enter the National Standard for Spatial Data Accuracy (NSSDA). They enable a statistically valid, consistent method to measure and report the accuracy and associated error of a geospatial dataset.
The real art and science of mapping and photogrammetry is estimating just how close to “perfect” are those measurements. There is little reason to talk about the accuracy of some geospatial product if one has no specific expectations of positional accuracy. Likewise, one cannot have any rational expectations of accuracy if the stakeholders have not defined the intended use of the geodata, and if it was not created consistent for that intended use.
People use Google and Bing maps every day for a plethora of applications. Most have no idea how accurate the maps are. In fact, there is no stated positional accuracy with these services. Users may not care because it just “works” or they know that the maps are “accurate enough.” But the wise professional understands the unspecified accuracy and guardedly uses these resources knowing it should not be used for some purposes.
Because the threat to public welfare and safety is real, the geospatial professional must understand “intended use” and ensure the specifications and deliverables are compatible with it. This is pertinent today because geospatial datasets are so easy to come by and many have no metadata that describe accuracy or intended use. The stakeholders should have a good idea how they intend to use any Ortho or LiDAR mapping requested. They should also know for what it is not intended. Its “intended use” should dictate the accuracy requirements. For example, mapping used to do land use planning may not need accuracies greater than plus/minus 3 feet, whereas mapping used for engineering purposes may need accuracies that exceed plus/minus 3 inches. The costs of mapping are proportional to accuracy: high accuracy means more cost. Therefore, defining its intended use can save money by not buying more accuracy than necessary.
How not to ask for accuracy. Solici-tations written for geospatial services are proffered by a variety of organizations. It is not uncommon to find lubberly accuracy specifications. For example:
“Spot elevations will be shown to provide complete and accurate vertical information.”
“Complete and accurate” is not measureable. Every firm responding to this will likely define “complete and accurate” in disparate ways and make comparing prices and scope impossible.
“The 100 scale digital orthos shall be compiled to meet 2.0 feet horizontal accuracy at 95 percent confidence level.”
This is getting better because they are trying to express an accuracy requirement using NSSDA prose. But just what is a “100-scale” digital ortho? Referencing “scale” for digital imagery when it can be displayed at any scale is meaningless. Leave “scale” out unless asking for paper maps or referencing the less pertinent 1990 ASPRS accuracy standards for large-scale maps.
A dataset should include a statement about its inherent positional accuracy. Was it “tested”? Or was it only “compiled” (untested) to meet some accuracy level? The role of the NSSDA is to prescribe standardized ways accuracy and error are tested and reported. Users may then readily compare the accuracy of one dataset with another and make more informed decisions about the “fitness” of the dataset with the intended use.
Precision. Accuracy. Variance. To understand how to quantify accuracy the professional needs to understand error: precision and variance. The “precision” of a dataset describes the closeness of one measurement to another. This is affected by random errors. “Accuracy,” on the other hand, describes the degree of perfection obtained in a measurement, that is, how close to the “true” value the measurements are. Accuracy is affected by both random and systematic errors. In the absence of systematic errors (and blunders), precision and accuracy are equal. As precision worsens so does accuracy.
A LiDAR (or Ortho) dataset is comprised of millions of “measurements.” Each point (or pixel) is assigned an X, Y and Z (not pixels) coordinate. “Uncertainty is certain.” Nothing can be measured without error. This error must be quantified. If one can’t quantify this error, the “confidence” in the data will be low, at best, and unfounded, at worst.
To estimate the accuracy of a dataset a field survey is performed to measure the location of at least 20 well-defined features (ground control points, GCP) that are visible in the LiDAR dataset. (The specific procedures on how to do this in each of the different point classifications is important but left to another article.) These 20 GCP represent the best (and best practice says, at least three times more accurate than the data being tested) descriptions of position in hand. The GCPs are certainly erroneous, but because better instruments with greater precision were used, these mere 20 measurements will have an order of magnitude better accuracy and precision than the LiDAR. The “confidence” that these 20 points accurately represent “location” is high.
By measuring the difference between the GCP’s coordinates (“truth”) and those of the LiDAR points that strike the GCPs, the error in X, Y and Z at each of our GCPs can be estimated. If this exercise is done for each of the 20 GCPs, an error estimate of the entire “population” of LiDAR points is calculated. Statistics says that a meager 20 measurements is sufficient to estimate the error in the datasets as long as the error is “normally” distributed (“trust but verify” this assumption − an important subject for another article) and if the GCP’s locations are well dispersed throughout the area.
By averaging the associated error in X, Y and Z with each LiDAR point reflected from the GCPs, the mapper has a nice idea of how “big” the error is on average. But we need more information than a simple average. For example, assume the average error in X and Y is 8 inches. But if a 21st GCP is measured elsewhere in the project, how close to this average error would the measurement be? Put another way, how confident can the mapper be that this average error is a reliable estimate of the next measurement? How is this error reported so it is comparable with other datasets? Happily, statisticians have described the procedure.
“Confidence” is quantified by studying the frequency and spread of the observations of error. Confidence varies inversely to the range of GCP errors. Were the measured differences between the LiDAR points and GCPs all very close to 8 inches (XY) or were some way out at 36 inches? In all likelihood the 20 GCPs varied considerably: Some were considerably closer to “true,” some considerably more distant. In fact, if the observations were collected into buckets all the measurements that were within 2 inches of “true,” and within 4 inches, and into a third bucket those within 6 inches, and then counted the number in each bucket, the data would be “normally” distributed. That is, there are far more error measurements in the 2-inch bucket than the 6-inch bucket, and the number in the 4-inch bucket are somewhere in between. This estimate of the dispersion of error is called the Root Mean Square Error (RMSE). This is a key statistic used in NSSDA reporting.
Once the spread (and frequency) of errors (RMSE) is known, the likelihood that the 21st (or even 100th) GCP will fall within some distance from the mean can be estimated. For example, suppose 95 percent of all our measurements are within 5 inches high or low of the mean error of 8 inches, that is, somewhere between 3 inches and 13 inches. The best the mapper can say is that there is a 95 percent confidence based on the observed average error of 8 inches that the actual error of the LiDAR is somewhere between 3 inches and 13 inches. To phrase this in terms of the NSSDA specifications:
“The LiDAR was tested to meet the horizontal accuracy of 10.0 inches for well-defined features at 95 percent confidence level.”
It’s not possible to pin down precisely how much “off” subsequent GCP measurements will be, but there is only a 1 in 20 chance (1/20 = 5 percent) that it will be in error more than 5 inches above or below the mean.
Fit and accurate. After the error associated with a geospatial dataset has been described and quantified, it can be compared with other geodata that have error quantified. Further, the stakeholders have a reasonable expectation of how far off (or wrong) the location of features will be in the mapping and they can determine if these are consistent with their intended purposes. Then, the abstraction of the real world will serve stakeholders and other informed users well. [Detailed procedures on how to calculate RMSE and confidence of geospatial datasets can be found in the Geospatial Positioning Accuracy Standards: Part 3: National Standard for Spatial Data Accuracy published by the Federal Geographic Data Committee, 1998.]
Mike Tully is the president and CEO of Aerial Services, Inc. He is a Certified Photogrammetrist (CP) and Geographic Information Systems Professional (GISP), as well as a member of the American Society for Photogrammetry & Remote Sensing (ASPRS), Management Association for Private Photogrammetric Surveyors (MAPPS), Society of American Foresters (SAF), and National States Geographic Information Council (NSGIC).