What to do With all that Geospatial Data
The amount of digital data is growing quickly and organizations are struggling to keep up. This is of special concern to firms that work with geospatial data, which consumes much more storage than traditional transactional data.
“Using and managing geospatial data is challenging because the data is usually larger than other standard business data,” says Joe Cantz, associate and geospatial discipline leader at Woolpert, an architecture, engineering and geospatial (AEG) solutions firm. “Most businesses deal with digital documentation or pictures that are small in size, with storage in the single terabyte range or less. Geospatial data is much larger in size, with many geospatial data companies managing and using hundreds to thousands of terabytes of data in a year. Processing, moving and storing terabytes of data presents its own unique challenges that many software and hardware manufacturers are still working on solving.”
Storing and Organizing Data for Access
It is not just a case of finding enough physical or cloud-based storage for data. Geospatial data files can be huge. Someone in the organization has to determine which files can be stored on faster access, more expensive media because the files are often needed, and which can be archived away to slower, cheaper storage media because they are rarely needed.
“The companies with the most extreme data challenges are those that collect, process and publish terrain and imagery data,” says Woolpert Research and Development Director, and Vice President Layton Hobbs. “Those of us in this line of work are now talking in terms of petabytes and even exabytes. These data volumes are more likely to be seen at a cloud data center, at a major film studio or in federal government digital archives. Data producers must invest each year to keep up, and they must invest not only in storage but also in high-performance storage, which can handle the rigors of big data processing.”
From the data manager’s standpoint, even this data can perpetuate itself based on how it is used. A raw, collected pixel or LiDAR point may be rewritten two, three or even more times throughout its production life cycle. The data manager must decide how and when to store changes that are made to the data, and this adds to storage and access requirements if multiple versions must be stored.
Cornerstones of Geospatial Data Management
For geospatial data under management, there are several objectives data managers are concerned with:
- Getting the most out of your data
- Finding ways to share data
- Ensuring data is secure and protected
- Ensuring database interoperability
- Facilitating meaningful searches of data so data is easy to use
Getting the Most Out of Your Data
“Most geospatial data is created for one specific reason or need, such as orthos for GIS applications, but there is so much more information in geospatial data that is underutilized or not recognized,” Cantz says. “Particularly with the newer technologies, the data-rich information is growing exponentially, but we are using only a small percentage at this point.”
One example is the tendency of companies to use imagery to serve as a backdrop for basic GIS functions when this backdrop imagery is actually four-band multi-spectral data that is rich in many other types of information.
“This information can be used for various other remote sensing applications such as land use, forestry and impervious surface delineation, and can be used to extract various other feature classes,” says Jeff Lovin, senior vice president and director of government solutions for Woolpert.
Another area where companies can better utilize their geospatial data and information is deep down in the data itself.
“The complex sensors we use are capable of storing more information in the pixel or point than most users are even aware of,” Hobbs says. “Not only can a pixel display the visible color of a point, but also the infrared value, which can be used to measure vegetative health. And those pixels are capable of storing a much wider range of values than the traditional 256 values of an 8-bit image. These modern systems, for example, often store four bands of data (red, green, blue and infrared) at up to 12 bits or around 4,000 values for each band. Combining those four bands for image interpretation creates 256 trillion possible combinations at one spatial location. This is definitely overkill for most applications, but shows the potential for big-data applications of imagery.”
Organizations must also determine better ways to share data with others, especially when it involves different government agencies and jurisdictions. This can be both a technical and a political problem.
“Technically, sharing data can be demanding whether through a Web-based solution or through more traditional means like hard-drives and DVDs,” Hobbs says. “Someone needs to administer the sharing process and most project administrators don’t have the time or resources to also serve as the data brokers. Politically, data sharing can also be a challenge when overcoming privacy concerns or when dealing with sharing costs across different political entities.”
In some cases, project consortiums at state or federal government levels can help to surmount these data sharing challenges. Organizations like the U.S. Geological Survey (USGS) have many years of experience with imagery and terrain collection. They can help facilitate consortiums of multiple organizations that share costs, data specifications and data collection, and that ultimately promote effective data sharing.
Securing and Protecting Geospatial Data
One of the most difficult things for organizations to do is to sit down and determine internally who should get access to which types of data, and what levels of access each individual should get.
These are internal data access policy decisions that the company’s CIO or data manager must work out with different internal departments, and these meetings can go long if there are debates over access that must be solved.
Nevertheless, to safeguard data from internal security and security authorization breaches, companies should implement and review data access policies and permissions minimally on an annual basis to ensure that data access requirements and authorization levels for individuals throughout the organization haven't changed.
The other element of data security and safekeeping is outside access or breach of data.
“As we know, geospatial data can tell a lot about a place, event or even a person, depending on what type of data we are dealing with,” Hobbs says. “Demographic geospatial data can be very sensitive, perhaps even confidential. Safeguarding this type of data really should be no different than how we treat similar, sensitive financial or human resources data. Few companies are paying enough attention to the real and immediate risks posed to their data.”
Database interoperability has been a problem with the sharing of geospatial data. The good news is that progress is being made through industry adoption of more open-source databases for geospatial data that use open standards. Because of this, using an open source database should be a strong consideration for any organization using geospatial data.
“Though not always the best solution, open database standards are a great starting place,” Hobbs says. “The Open Geospatial Consortium (OGC) and their consensus-based, open-source data standards are helpful when deciding on a data storage format/system. Depending on the data type and application, the OGC provides a wide array of open standards for imagery, GIS, terrain and even some emerging formats for smart cities and intelligent transportation. The key characteristics we look for, either in an open or proprietary standard, are efficient use of indexing and architectures which minimize computational overhead.”
Effective Data Searches
Making data searchable starts with making the data discoverable and data discovery is all about metadata, which is data that describes data.
“Often these metadata requirements are an afterthought or are so obscure to a program administrator that they don’t even require it,” Hobbs says. “When we work with geospatial data, we know that it is absolutely essential to start with good metadata and then attach keywords that best describe the type, location, date range and authorship of the data we are delivering. With accurate, imbedded descriptors and keywords, internet and GIS search engines can better search and discover applicable geospatial data.”
Finding cost-effective and secure ways of storing, managing and getting the most out of geospatial data are key focal points for organizations that rely on geospatial data.
While there is no “one size fits all” approach for every organization, there are several best practices that have emerged and seem universally applicable. These include:
- Consider off-premises, cloud-based storage of your data
Cloud-based storage, security, data protection and other ancillary services, like helping you organize your data, come in very sophisticated commercial offerings these days. Some cloud solutions providers specially focus on geospatial data, so they can be a great resource.
- Review your internal governance practices and security permissions with your users
Too many organizations fail to pay attention to who is getting access to what information. Their security guidelines and protections may be lacking as well. Organizations should define the security, privacy and data safekeeping levels that they need. If they lack the internal resources to do this work, they should consider hiring a security consultant to help them through this task.
- Look for cost and data sharing opportunities
The Open Geospatial Consortium is a good place to start. It can also assist you in making the move to an open source database for your geospatial data that will give you more opportunity to share data. There are additional geospatial data-sharing consortiums in state and federal governments.
- Get the most out of your data
You can do this by taking the time to assess the hidden values in your geospatial data and then determining different data values that your organization has not taken advantage of so you can take advantage of these opportunities in the future. It is equally important to make your data easy to search for your users by developing effective metadata that describes the elements of your data. The more your users use the data, the more your organization is going to get out of it.