Solutions for Skyrocketing Data Storage
November 1, 2006
Data is the lifeblood of today's professional firms. Data and file-storage needs for surveying, engineering and environmental consulting firms continue to grow at an impressive rate. According to a recent survey by ZweigWhite, an information services company located in Natick, Mass., technical firms today have a median of 12.3 gigabytes of online disk storage per employee. This represents an increase of 52 percent over the 2005 median of 8.1 gigabytes and a whopping 373 percent increase over the 2003 median of 3.3 gigabytes. Moreover, survey respondents indicated they are planning to increase data storage spending by about 15 percent this year, and they further indicated that file management and data storage is one of their biggest technology challenges today.
How times have changed. I remember clearly the days of the 5 Â¼" floppy drives that held 360 kilobytes of data. Those were very common when I first entered the workforce with a surveying and engineering firm in 1983. I can even remember WordStar, which was a very popular word processing package long before the days of Microsoft Word. In the early "˜80s, I could carry around two or three months' worth of word processing files and the WordStar software on a single floppy. Not long after that, IBM introduced the PC XT, which came with a 10 megabyte hard drive. Many technical firms were able to use the PC XT for a year or two before outgrowing its amount of disk space.
Today, many of us have individual Word or PowerPoint files that are larger than 10 megabytes, and we can carry around 16 gigabytes of data on a USB jump drive the size of our index finger. There are a number of forces driving this exponential growth of data, and a number of technological solutions that firms can explore to address the challenge of securely storing their data.
Forces Behind the Growth of DataThe reasons for the high growth of data storage for surveying, mapping and engineering firms are numerous. These include the popularity of digital imagery, the high resolution of data sets created by airborne or ground-based LiDAR, and an overall thirst for more digital information related to everything that we do.
Just a few years ago, digital orthophotos, or accurate image-based maps, were created for only a small percentage of mapping projects. The software packages that mapping firms used to create orthophotos were not as efficient as the tools we have today. The hardware capabilities and speed of processing have grown considerably over the last few years, making it much more efficient for a mapping firm to process these image maps and for the end user to manipulate and view these large files. And finally, almost all CADD and GIS software packages have the ability to easily ingest and display the orthophotos today.
Because of these changes, the number of projects that include digital orthophotos in the list of deliverables has grown significantly--now most projects include them. As a result, the end clients or public users find a significant benefit in looking at imagery, which they can easily relate to and understand, as opposed to the sometimes confusing vector-based planimetric and topographic maps that we as professionals are comfortable with.
But the story is even more complex. In addition to the increased number of projects with digital imagery requirements, the desire for higher resolutions also plays into the mix. Today, it is easy to move around and display high-resolution images on high-end desktop machines. At these higher resolutions, smaller features such as paint stripes, utility poles and manholes become clearly visible in the digital imagery, which can be very valuable for certain types of projects. Resolutions continue to increase every year, which translates to increased data storage needs.
Imagery is not the only culprit behind growing data storage. Other technologies are significant in this discussion. Airborne LiDAR is often used today to develop an accurate elevation model for design projects and GIS base maps. Today's top-of-the-line LiDAR units have increased significantly in their ability to capture rapid-fire data points. In fact, units today have the ability to capture data at 150 kHz, or 150,000 points per second. Consider that, at this rate, it only takes 6.7 seconds to capture one million elevation points! Data postings of 5 or 10 meters with LiDAR were common just a few years ago. Today, our clients rarely call for a posting of more than 2 meters on the ground. That too is a significant data growth especially when you consider, for example, that a 2 meter posting contains 25 times the amount of data as compared to a 10 meter posting.
Dr. Kirk Waters of NOAA's Coastal Services Center (CSC), which works to bring information, services and technology to the nation's coastal resource managers, leads the agency's efforts for LiDAR data acquisition in coastal areas. He comments that he has seen many changes over the last few years, specifically noting, "The ability to acquire higher density elevation models is important to the needs of our professionals and those of our partners, but these higher densities have had a significant impact on the amount of data that we must manage."
LiDAR is typically the cheapest, most efficient way of generating an elevation model for a large project. However, LiDAR is indiscriminate in that the density of ground points is typically the same throughout the project area. Data postings in flat areas, which would normally require a small number of elevation points to create an accurate model, will be the same as areas with significant elevation change. Most projects are therefore planned with a data density based on the worst case scenario or the toughest areas to model, resulting in a higher than needed density in flat to moderately sloping areas.
Moreover, ground-based LiDAR is finding its way into more traditional surveying firms, and consequently the technology is being used for more clients and more project types. This too adds to the growing data storage needs.
These are some relatively large technology changes, but there are other data storage dynamics in play. For example, many professionals today make use of handheld digital cameras. Perhaps you include color photos in your survey reports to illustrate new control points that were set as part of a project or have snapshots of buildings or other improvements at the project site. Maybe you also add detailed CADD sketches of the point and project vicinity that will aid in finding the point in the future. All of these images offer valuable information, but can add considerably to the file size of the digital report and, therefore, the information we must store in the office.
And finally, even the proposals we put together today are much more elaborate and typically include numerous digital photos, diagrams and graphs. Compare the average file sizes of your proposals today to the ones you created just a few years ago. If your firm is like most other professional firms, you will see significant growth in the size of these documents.
Technologies for Data StorageTechnology provides a number of options for data storage today. The right solution for your firm depends upon many factors, including the total amount of storage required, the number of users that will access the data, the cost of any downtime or lost production, and the level of sophistication of the person maintaining the storage.
The simplest technology solutions may include single or multiple hard drives in your desktop system or a redundant array of independent disks (RAID) attached to a network server. RAID has many advantages over single drives in terms of fault-tolerance, increased data integrity and storage capacity. A RAID option takes advantage of multiple drives to store data, but the computer's operating systems sees the RAID as a single entity. Certain RAID configurations provide continued uptime with no data loss, even if one of the individual hard drives within the RAID fails.
Direct attached storage (DAS) is a common option for network storage. DAS is exactly what its name implies; it is a storage system directly attached to the server and as such, all requests for data must go through the server. During times of peak data serving, this can tax the server's ability to perform other tasks. RAID is often used in a DAS solution. While there are practical limitations as to the amount of storage you can have in a single DAS (around 16 terabytes), DAS can be right for many small- to medium-sized professional firms.
Network attached storage (NAS) is a storage solution that contains both hard drives and data management software to provide dedicated file services over a network. It is simply storage attached to your network and since it includes data management software, the task of managing storage and file sharing is removed from the server's responsibilities, improving network performance.
A storage area network (SAN) provides many advantages over the DAS and NAS solutions. It also comes at a significantly higher price tag as compared to other options. A SAN is a dedicated, high-performance storage solution that allows for the efficient management and storage of data between networks and storage devices. Think of it as a hub that any number of networks or servers can attach to. Requests for data go directly to the SAN.
While SANs are expensive, they also have significant advantages for firms with large data requirements. SANs are almost infinitely scalable and can range from one to hundreds of terabytes of storage. SANs typically have very little downtime and, as a result, are often used for data that must be available 24/7. Additionally, most SAN appliances do not have a single point of failure. You can replace failed drives or make upgrades to the system without interrupting their operation. SANs can easily be shared since they are not directly attached to any one network or server.
Issues Beyond Online Data StorageThe issues that we must consider go well beyond the somewhat simplistic story of data storage alone. Added importance in the areas of data backup, project archiving and disaster recovery is generated because of the growth in data storage.
Today there are several interesting options regarding data backup and disaster recovery at our disposal. Many firms use tape systems to make periodic backups and store the tapes at a remote location. If something catastrophic occurs at the office, such as a fire or natural disaster, the tapes are then used to restore the necessary data and get things up and running again. Reasonably priced tape backup systems have the ability to store 400 gigabytes native, and up to 800 gigabytes of compressed data.
One of the highest growth areas in IT today is offsite data backup. I was amazed to find seven and a half million search results from Google under the subject "offsite data backup." With offsite data backup, your in-house data is pushed across your bandwidth to an offsite location where it is stored on large computer storage facilities. If you inadvertently delete a file or project folder or have a significant data storage disaster, you can easily retrieve the missing data from the offsite storage facilities. The monthly charges for this option depend on both the amount of information that you store offsite and the bandwidth you use pushing data to and retrieving from them. Costs start at around $100 per month.
Most firms have numerous options for archiving data. CDs and DVDs can be used for many small to medium-sized projects. A CD can hold up to 700 megabytes of data, making it ideal for many small projects. DVD technology, which is the same physical size, uses a higher-density laser pattern and can hold up to 4.7 gigabytes of data. Be careful, however, of the type of media that you purchase for archival purposes. While the inexpensive consumer-grade CDs and DVDs may be fine for delivery of project files to your clients, more expensive archival-grade media should be used for permanent archives. You should expect some data loss and file retrieval problems after only a few years with consumer-grade media. Archival-grade media, based on storage on a reflective layer of gold, is predicted to last hundreds of years. The archival-grade is significantly more expensive, around one dollar for CDs and a little more than two dollars for DVDs, but the additional cost is worth it to preserve important data.
When project information greatly exceeds the capacities of either CDs or DVDs, you may want to look into using external hard drives for your project archives. External hard drives that use either a USB or firewire connection to your desktop or laptop come in many sizes and are very efficient in the transfer of information. Prices for these external hard drives generally run in the range of 50 to 60 cents per gigabyte. According to the CSC's Dr. Waters, "Internet speeds aren't always adequate for the transfer of much of the data we acquire. We often rely on shipping hard drives to state agencies and other partners."
In the FutureOur digital data and processing abilities are the backbone of our professional organizations. As our digital capabilities expand, the increasing amounts of imagery and other files we create present an ever-growing challenge of storage. Yet by carefully considering the storage options available today, surveying and engineering firms can implement solutions that best fit their needs. In the future, we will continue to see impressive growth in our data storage needs. This growth will create new challenges, but technology will continue to provide us with new tools and storage solutions.
Sidebar: Decoding Data Storage TermsRAID: Redundant Array of Independent Disks that is attached to a network server.
DAS: Direct Attached Storage that sends all requests for data through the server.
NAS: Network Attached Storage that contains both hard drives and data management software to provide dedicated file services over a network.
SAN: Storage Area Network (SAN) that provides a dedicated, high-performance storage hub that any number of networks or servers can attach to; requests for data go directly to the SAN.