"They give me all kinds of advice, designed to enlighten me." -John Lennon

Create, manage and share. These were the three activities associated with GIS data we discussed in the preceding installment of this series ("Getting Data Into a GIS (And Out)" POB, August 2005). Just as it is in surveying, data is the "coin of the realm" in GIS. It is far and away our most important product. It's more valuable than either our hardware or our software. Hardware and software come and go, but data is unique. It is expensive to collect. And it is even more expensive to replace. Storing and keeping track of data over time is also not without its challenges.

Surveyors have struggled for years with managing their data. There are many approaches one can take to address the issue, ranging from the elegant to the absurd. There are data users, and there are data custodians. Surveyors are more often than not both. And if we retain data, then we must maintain data.

Once we decide that the management and storage of our data is a business imperative, we need to formulate a plan and develop some business rules about that data. The alternative is to risk losing the most valuable asset in our inventory.

Why do I say most valuable? Let's take a moment to think about that. If a vehicle, or an instrument, gets lost, stolen or otherwise destroyed, and your insurance is current, it is usually possible to be up and running again within a few days or even sooner. That is because vehicles and instruments are standard tools, and there are more just like them available. But what about a fire that destroys your office with your computers and hard copy file cabinets?

Insurance can replace your office, your computer hardware and your software. But what about your records and your data? How are you going to replace those? The answer is, unless you have a plan in place to store and secure your valuable data, you may not be able to.

A plan to store and manage data involves a process. So, we need to make some decisions about what kind of data we are going to manage. Most organizations today, small and large, typically generate and receive a large volume of both paper and digital documents. And many of them need to be retained for the long term.

A popular term for the process of handling documents and data is called Life Cycle Management (LCM). The LCM approach to the problem of document management is by project. The LCM program assumes that there is a beginning and an end to each project, and when a project is finished it is filed away. This solution is fine for documents, but what about data? Data is a living thing. It can be queried and consulted for a variety of purposes not necessarily associated with the original impetus for its creation.

A data set is a collection of data with a common theme. It is often constructed from many documents, whereas a document is a single purpose stand-alone inscription. But a document can contain data and often must be retained to support the data set. Another word to remember as you start to manage data is "interoperability." You will be hearing it more and more as the technologies we use in our daily business activities begin to merge.

The finished product.

How Can GIS Help?

Good question! The catch phrase "Think Globally, Act Locally" coined by environmentalists is a mandate in the GIS community. Generally speaking, it means one is wise to evaluate how his actions relate to and impact the larger community. However, all too often people approach problems by thinking only locally and acting only locally. In data management, this is a prescription for disaster. A more diverse solution is much better. Geographic information systems provide a framework for data storage by focusing on the relationship of data elements to geospatial coordinate systems. That way you can always have the information available locally, but it spans your entire area of interest in scope.

Who is Going To Use The Data?
This is a question that needs to be addressed early on. Traditionally the startup costs of constructing a GIS have been prohibitive to small organizations, but that is rapidly changing. It is now possible to construct a GIS-based data management system for a modest investment. But managed data frequently lends itself to becoming shared data. And that brings up the twin issues of scalability and the Hetero-geneous Distributed World (HDW). Scalability is techie jargon for selecting a system designed to be expanded for possible future needs. HDW is the latest buzzword euphemism for the web. In this instance it means that someone building a database may wish to consider whether to "web-enable" it. This doesn't necessarily mean posting it to the Internet; it is becoming more and more common for large companies with offices in several cities to want to share data.

Leveraging Existing Data
One of the advantages of managing survey data in a GIS environment is that it isn't necessary to store and maintain framework and base map data. There are various ways to leverage existing spatial data to use as a framework for your system. The user's task is to register his own data to that existing system.

There are many sources of base map data available on the web. And quite a bit of it is free. The ESRI website has links to base map data. And the Bureau of Land Management (BLM) has most of the data of the Public Land Survey System (PLSS) available for free downloading. The list of cities, counties and states making their data available is expanding every day.

We have discussed differences between CADD and GIS packages in previous articles. At the risk of being redundant, we need to cover some of the nuances again here. We need to keep in mind that CAD drawings are documents. Yes, they often contain data. But they are generally "snapshots" in time and are stored by "revisions." We will take a more detailed look at how CADD documents "interoperate" in a GIS in my next column in April.

Data view in ArcCatalog.

How Do I Organize My Data in a GIS?

This is a simple question with a somewhat complicated answer. Many of us started out keeping our job files in manila folders. (And yes, some of us still do.) And then we took the manila folders and placed them in file cabinets organized by name, date, etc.

When desktop computers gained popularity many of us were taught that digital computer files were just like paper files. Indeed, computer software developers even represent their virtual folders and file cabinets with icons resembling their paper and metal counterparts.

Any computer files are easily organized by this simple system. Yet surprisingly, many users are derelict in setting up orderly electronic folders. The more organized the files, the more easily they are linked to a GIS "front end," also known as a platform or graphical user interface (GUI). Developing a naming convention with unique identifiers is the most critical part of this process.

Scanning and Registering
Hard copy paper documents need to be evaluated carefully. Scanning a large number of documents can be an expensive and time-consuming process. In order to access these documents from a GIS "front end," they need to be registered, or linked by georeferencing. The simplest method is to link a unique identifier common to a particular document or group of documents.

Once you get all of the data converted and organized the last thing you want to do is risk losing all of the work you put into a project of this magnitude. So it might be wise to look beyond the desktop and explore some long-term storage options. There is much to consider, including data and metadata, metrics, retention schedules, offsite storage, data servers and backups.

To understand what kind of servers you need and how to back them up, you'll need to familiarize yourself with a few new Information Technology (IT) terms.

UPS: Uninterrupted Power Supply; a capacitor device that allows a server or work station to make a "soft landing" in the event of a power outage.
SAN: Storage Area Network; high-speed special purpose network interconnecting different storage devices.
LAN: Local Area Network; smaller and less sophisticated than a SAN.
NAS: Network Attached Storage; smaller than a LAN and more economical and easier to implement.
RAID: Redundant Array Independent Drives; assembly of disks considered a faster, more secure system than a LAN.
MTBF: Mean Time Before Failure; an indication of product life assigned by the manufacturer or a testing laboratory. (Pay close attention to this one.)

Now if all of this sounds a bit complicated and intimidating, not to worry. There is help out there. Enterprise Content Management (ECM) is the latest term for packaged solutions to control, manage and share information. There are several firms now offering these services that take the client all the way from the scanning of paper documents to the website and beyond, including everything in between. (And they're just a Google search away.) "Do it yourself" is also an option. There is a wealth of information and support available on the Internet. But a survey business of any size will almost certainly need expert help.

Once you get all of that data organized into folders and properly georeferenced, it will appear in ArcCatalog and look similar to Figure 1 on page 38. I have focused much of the content in my "Surveying GIS" columns on the ArcMap component of ArcGIS. ArcCatalog is the data management engine of ArcGIS, and we will be taking a more in-depth look at it in a future installment.