Managing Geospatial Data
It is increasingly important to get a handle on volumes of data associated with surveying and geospatial work.
Data is exploding and there is no end in sight. This phenomenon is creating quite a challenge for businesses of all types, especially those who need to gain insights from the vast quantities of geospatial data being created by new, advanced high-resolution imaging technologies.
Before we look closely at the impact of the data explosion in the geospatial sector, let’s consider how much data is being generated overall. About 90 percent of all data today was created in the last two years, according to IBM, and that breaks down to 2.5 quintillion bytes of data every day. Putting it into perspective is a recent infographic from Domo that broke down how much data is created every minute of every day. Every minute, Google conducts 3.6 billion searches and Americans use almost 2.7GB of internet data. In just a few years, our accumulated digital universe of data will grow from 4.4 zettabytes today, to around 44 zettabytes (that’s 44 trillion gigabytes) by 2020, according to Analytics Week’s Big Data Facts.
Extracting Value from Today’s Data
These numbers shouldn’t be that big of a surprise, since everywhere you look Internet of Things (IoT) technologies are being marketed, data visualization tools are displaying content in amazing new ways, and big data analytics is driving business intelligence across every sector.
These new technologies, tools and analytics assume there is an underlying infrastructure that can safely handle massive volumes of data and render this data so it can be quickly consumed for a vast array of end user needs. The topic of data management and infrastructure may not be exciting for business leaders, but they need to pay attention. “Data is the new oil,” Shivon Zilis, a partner with the venture capital firm Bloomberg Beta, said about data’s increasing value during Fortune’s 2016 Brainstorm Tech conference. So, business leaders need the right tools to drill through that data and extract the value.
The average person probably doesn’t pay much attention to the issue of data management because, if you are like me, you regularly hear talk about how cheap data storage is now. It is hard to disagree when a quick search on Amazon turns up 4 TB hard drives for little more than $100. But storage is just one component of data management.
Data also must be accessible to developers and end users. Therefore, it must be served in a consumable format that addresses end users via a highly available infrastructure. After adding access, performance and availability to the data management equation, it becomes exponentially more expensive and complex than simple data storage.
Geospatial Data Challenges
Identifying data as a critical business asset and managing it as such is something every organization needs to embrace to succeed. Within the geospatial services community, we continue to primarily leverage structured data, but formats can vary from normalized database structures to massive static files or dynamic temporal data.
At Quantum Spatial Inc. (QSI), we are continually challenged to share our geospatial data securely across multiple time zones during development, as well as once it is operational. Provisioning data to end users via web-based services forces us to evaluate usability versus quality continually.
While all companies with large geospatial data collections consider their data an important asset, to realize the full benefits of this asset, they must organize and provision it in a way that it can be found and used to solve pressing challenges. However, our clients often tell us that their full collection of geospatial data is not easily accessible for a variety of reasons:
- Different file formats – Vector data fits nicely into relational database management systems (RDMS), but because large raster datasets and LiDAR points are rarely stored in an RDMS, they are usually kept in a different part of an organization’s network. This results in geospatial data scattered across multiple locations on the network to accommodate different formats.
- Data sharing – In large organizations, a team might acquire data without considering how it could be leveraged beyond their specific needs. As a result, this data can remain buried within that team’s own file management structure or access can be restricted to only privileged staff and not be transparent or accessible to the organization as a whole. Sometimes, however, this can be intentional, if data contains sensitive content that only a subset of staff is authorized to access.
- Slow network connections – Many times network connections are not fast enough to support use of massive files, like LiDAR point clouds, so individual users download data to their workstation to perform development and analysis because they cannot access just the subset of data they need.
All of these problems result in deficiencies that reduce the access, performance and availability of data to be fully utilized.
Data Management Lessons Learned
At QSI, we currently manage more than 12 petabytes of data, and are adding 1 petabyte of data each year on average. With so much geospatial data in our hands, we have learned a number of lessons that enable us to mitigate the common problems identified above and help users fully realize the return on investment (ROI) in their data assets.
Lesson #1: Data Stewardship – The first step in ensuring your vast amounts of data can be transformed into a valuable asset is creating “data stewards” whose responsibility is curating data for the entire organization. This internal team archives datasets, organizing them in a structure that can be searched and retrieved for future use. This process enables the organization to not only offer historic data to internal users, but also make it available to customers. These data stewards act as librarians, who build tools and provide consulting services that help guide consumers of the data to the right location.
The federal government has nurtured the role of data stewards to make great strides in provisioning their wealth of data in usable formats. One great example is what the National Oceanic and Atmospheric Administration (NOAA) has done with its Digital Coast initiative, which pools data sets from a variety of sources and provides visualization and predictive tools to meet the needs of the coastal management community. The Digital Coast catalog enables users to search easily for data, including vector, raster and LiDAR, and access the data in a variety of formats, including web services, APIs and through downloads. The Digital Coast project would not be possible if NOAA had not identified data stewards to be accountable for these data collections, with the mission of making this tax-dollar-funded data available to the public.
Lesson #2: Data Management in the Software and Analytics Development Lifecycle – Throughout my career, I have repeatedly seen geospatial data development and integration cause significant drag on development processes. When users need a problem solved, technical teams jump into designing the solution using analytical tools and developing custom applications. But, in the end, any decision-making tool is only as good as the data that goes into it. Too often the data that drives these tools cannot be easily discovered, is not in a format that can be fully leveraged by the solution, or it doesn’t perform at a speed to support the tool, especially web-based tools pulling data in from different sources.
At Quantum Spatial, I lead a team of geospatial application developers, primarily building highly customized products for a variety of clients. My colleagues perform advanced data analytics on multispectral, hyperspectral and thermal imaging, and LiDAR. To mitigate this drag on development, we use an agile development process that emphasizes data requirements and data development as the highest priority in early sprints. We find that having data stewards accessible for consultation during these early sprints is critical to getting out in front of data management in both the software and analytics development lifecycles.
Lesson #3: New Technologies – In addition to data stewardship and early assimilation of process-based best practices, we also look for how new technologies can help us realize a better return on investment.
One such technology is cloud services, which are driving cost savings and improving accessibility. When performing data acquisition and development for clients, cloud services enable us to host and provision secure, scalable, highly available data. Building infrastructure in the cloud enables lean operation and just-in-time processes that reduce overhead and generate value to end users.
To support data hosting in the cloud, we had to re-evaluate our internal network and invest in increased bandwidth. Increased bandwidth enabled us to push data quickly to the cloud, which was instrumental in QSI’s ability to provide geospatial data in response to Hurricanes Harvey and Irma in 2017. We were contracted by Vexcel Imaging through the National Insurance Crime Bureau to collect orthoimagery immediately after the storms passed with the goal to make data available online within 24 hours of our planes completing data acquisition. To meet this urgent turnaround time, we had to be able to pump data from our processing machines to an Amazon Web Services (AWS) S3 bucket. In addition to leveraging improved bandwidth, we also converted the data from geotiffs to Meta Raster Format (MRF), a web-enabled file format built at NASA’s Jet Propulsion Lab. The MRF files decreased file size ten-fold while still meeting the quality standards desired by our client.
Cloud services not only reduce costs, they also enable users to respond dynamically to performance constraints. For example, based on our experience, most consumers of LiDAR data were converting it to 2D products, such as hillshades or DEMs. Improvements in web-based rendering, such as WebGL-based products like Potree, now enable LiDAR data users to step into 3D point cloud to visualize their surroundings. To push billions of points into a client’s web browser, we need the ability to monitor performance and ramp up the CPU of our servers as-needed. Cloud services enable us to spin up small demo servers for LiDAR viewers or adjust production servers to handle additional point clouds, all without going through hardware procurement processes.
These are only a few examples of steps that can be taken to improve data management within the organization. While there is a price tag associated with implementing data governance, ensuring continuity of service and making data accessible, these costs need to be factored into the overall investment in data as an asset within an organization. With a clear plan for geospatial data stewardship, utilization of new technologies, and proper data management, companies can exponentially increase the value of this already important asset.