By 2020, it is estimated that there will be over 40 trillion gigabytes of data globally.

Managing all of this data is indeed a growing problem. It can be especially impactful for organizations that are storing and using exceptionally large files of data, such as CAD files, photos, videos, and drone and IoT (Internet of Things) sensor readings (i.e., government agencies, surveyors, maritime operators and port managers, oil and gas companies, agriculture, logistics providers, mining companies and construction).

Key areas of data management challenge are:

  • Non-Digital data;
  • Data retention;
  • Data integration and aggregation; and
  • Data access.

When companies get inundated with data, data management is overwhelming on a day to day basis, so they just store the data “anywhere.” This practice makes it difficult to find all of the data later, and also to determine which data is most recent.  It becomes a formidable task to combine different types of data into new data “mixes” for purposes of analytics when you don’t know where all of the data is; and it is almost impossible to track how much IT you are spending on excess processing and storage for data that you might not really need.

To address these serious data management issues, companies throw person power at data management projects.  Unfortunately, this takes staff time away from other important work that is needed in the end business.

How can companies “get ahead” of the avalanche of data that they’re being tasked to manage?


#1 Set your data priorities

What data is absolutely critical to the operations and strategies of your company?  Is the data digitalized so you can store and access it electronically?  How do you want to use your data long term?  Are there new needs for data that you might have two or three years from now that you don’t have today?  Answering these questions, and getting consensus from key stakeholders in your organization are both very important, because you need to have a data strategy if you’re going to manage your data well.


#2 Review your non-digitalized data and determine an approach

If you’ve accumulated CAD drawings, photos, videos, etc., in hardcopy over the years, you have to choose what to digitalize because digitalization is expensive.  Many companies decide to digitalize the last two or three years of documents, and then stow the older documents in office closets or offsite storage.


#3 Establish a data retention policy—and execute it

At the same time that you decide what you’re going to digitalize and what you’re not going to digitize, it’s also opportune for you and your business stakeholders to get together to determine which data you want to keep and which data is useless and should be discarded.

This is important because if you just collect everything, you’re going to be overwhelmed with data, and the cost to manage it (i.e., storage costs, people costs, etc.) will chew into your operating budget. 

There are several ways to evaluate which data you’re going to keep, and which data you’re going to discard.

First, check into your data storage obligations for your company, your customers, and your regulators.  Data should be stored to meet the short- and long-term storage requirements for all of these parties.

Second, look at your data to see which data is frequently accessed, which is occasionally or seldom accessed and which is never accessed.  At a minimum, consider discarding the data that is never accessed.

Third, ensure that you have buy-in and agreement from your company’s stakeholders on the data you’re going to retain.  Perform your data purges annually, based upon the criteria you have set.  At the time of the annual data purge, you can also sit down again with your stakeholders to see if any data retention needs have changed.


#4 Make sure your most frequently accessed data is easily accessible

To get the most out of your investments in computer processing and storage, a data management strategy should be developed that places most accessed data on fast memory and processing, and most of the seldom-accessed data on dependable but slower processing and storage.  This optimizes your IT spend for data management.


#5 Determine where you want to aggregate data

With the growth of IoT and data that is now flowing in from many different sources, most companies want to establish ways where all of this data can be pooled for the purpose of performing analytics.  To do this, analytics data repositories are built from the collection of data from many different sources.

You can  “jump start” this process by determining which types of data you want to combine into a single data pool for analytics.  For example, if you want to pool government furnished weather data with photographs and IoT readings collected from drones, you can do that by identifying the sources for the data, extracting the data from the sources , and then loading and combining the data into a single data repository for analytics.  The process can even be automated.  This eases the load on your own staff’s data management responsibilities.


Final remarks

Data management is a major organizational challenge – and it’s going to get worse as more and more data flows in.

To deal with the data avalanche, organizations must  develop sound data management  strategies that help automate elements of data management, establish clear data management policies, and deliver data to key stakeholders when and where they need it.

In many cases, organizations have skilled individuals in-house who can help with data management.  In other cases, companies lack resident experts and must seek data management services from an outside vendor or consultant.