Storing GIS Applications in the Public Cloud
Companies deploying ArcGIS and other applications in the cloud need specific storage functionality that to date has been lacking in the cloud: robust block and file storage solutions with large shared volumes, high availability and large cache pools. However, public cloud services often don't stack up. The reasons are many:
Cloud storage options to date are mainly object storage, yet highly computational applications require block and file storage, and these are rarely available in a cloud-deployed model, especially file storage
Volume sizes of over 1 TB in size – since many map tiles exceed this typical limit on public cloud services
Sharing of volumes by multiple server attachment – often a requirement of highly available applications and a feature missing from virtually all clouds, large and small
Large cache pools also are typically not supported, but essential to geodata applications
Common protocols such as NFS or CIFS are critical but rarely supported in the cloud
Service level agreements, which are all-important for organizations that offer SaaS solutions to surveyors, are imperative to run a business yet hard to find in many public cloud services
- Last but certainly not least, customers have to feel comfortable that they have 100% of the control over their images and data – not just for security reasons, but also because at the end of the day, survey data is their intellectual property.
Certainly the major GIS vendors have “cloud-ified” their offerings, and professionals from many vertical industries including public administration, architecture, engineering and construction (AEC), real estate and insurance use these offerings to take advantage of the cloud’s superior access and mobility. Yet storage for customer data in the public cloud remains problematic and often exceeds the bounds of cloud storage offerings today. First off, the customer needs control of fine-grained storage choices and application settings, and that is not always possible. Secondly, without specific addressing of the 1 TB volume size issues, cloud enabled offerings can necessitate complex divisions or duplications of files, which require more time and cost to manage and configure. Lastly, without addressing the single-server attachment requirement, cloudified approaches also make it hard to achieve sufficient scale and availability with IOPs intensive GIS applications.
As a result many custom, internally developed applications have remained locked in on-premises storage resources because cloud storage options have not supported key architectural requirements that are essential for their on-demand performance. Certainly object stores can deliver 100 TB or more of storage. But if the underlying architecture to support high availability imposes architectural workarounds of an extreme nature and/or forces a high cost for storage redundancy – organizations think twice - and often that’s where the public cloud exploration ends.
Technology advances in supporting many of the storage formats and requirements implicit in land surveying and GIS applications have freed surveying applications to more readily leverage public cloud options such as those from Amazon Web Services and others.
Founded in 1838, Swisstopo (the Swiss Federal Office of Topography) is the federal mapping agency for the Swiss confederation and is responsible for geographical reference data and provides measurements of Switzerland used heavily by surveyors. The agency ascertains and documents changes in the landscape (geological, geodesic and topographical)and produces maps of the country which are updated and published regularly in various scales - maps that are renowned worldwide for their quality and accuracy.
The agency was among the first European organizations to enter the AWS cloud in 2009 when it debuted its Federal Spatial Data Infrastructure (FDSI) using Amazon S3 for fixed map tiles in a database, and using Elastic Block Store (EBS) to store application data to be rendered for delivery as route, hiking and boundary maps.
However, the only way Swisstopo could support high availability (HA) with FDSI at the time was to take nightly snapshots of identical data replicated across multiple volumes on clusters of multiple servers. Doing this worked around restrictions on volume sizes and mounting data only onto a single server (a problem which was simplified by the read-only nature of the data). But the approach meant that Swisstopo’s 10 TB of data quickly amounted to over 55 TB of EBS storage data spread across 100 EBS volumes. Meanwhile scalability was a concern, performance degradation was already occurring during summer tourist season when website visitors tripled, and scaling to more volumes would have further complicated management.
Swisstopo trialed an enterprise Storage as a Service (STaaS) solution (Zadara Storage) using NFS in May 2013 at the AWS US East (Northern Virginia) data center and went live with it at the AWS EU (Dublin) data centers in September 2013. As it completes its rollout, Swisstopo will be able to cut those 100 EBS volumes down to just one single volume. With the solution’s ability to share storage among multiple EC2 instances, the total storage capacity has been reduced from 55 TB to 4.5 TB, with the additional benefit that the need for replication has been eliminated.
Swisstopo’s new approach allowed the data to reside all in one place, dramatically simplifying management and freeing its IT managers to spend time better meeting strategic development goals instead of managing storage volumes. Scalability is also far simpler now: as web traffic triples in the summer, Swisstopo can simply use the inherent elasticity of the storage approach.
The switchover set Swisstopo on the path to lowering its EBS storage use by 90 percent and its total storage costs by approximately 50 percent, according to Hanspeter Christ, Deputy Head of Process, Federal Spatial Data Infrastructure (FSDI) Web Infrastructure at Swisstopo. It also gave the agency a pathway to far easier management and scalability, he said. Users receive the predictable storage performance of a single-tenant array at the economics of a multi-tenant array. In addition the agency has 100 percent control to provide very granular settings over its storage as needed, and is not restricted to the limits and functionality of the native storage offerings underlying its cloud compute provider.
Smart Surveying Using ArcGIS in the Cloud
A leading spatial data firm in Wisconsin has a flourishing business in aerial “smart surveying” where it merges multiple imagery files to visually display a rich set of information that make it easy for rights owners in the energy, environmental, transportation, and government fields to understand and monitor their assets digitally – and gain insights into how to better manage them.
For a long time, however, the company had a “storage problem” – as in, how to store and access the masses of ArcGIS data - that hampered its ability to meet client requests to deliver their data on the Web instead of on hard drives. Since many clients work from remote offices where there is not a sizable on-premises IT infrastructure, the company wanted to enable greater client ease and mobility.
As a result five years ago this geodata player began deploying customer-facing applications based on ArcGIS at the Amazon Web Services (AWS) cloud. The company uses ArcGIS Desktop to build maps, merge imagery and vector line work, then uses ArcGIS to publish these to ArcGIS Online as well as to AWS. However, the storage component of the application at AWS quickly proved to be problematic. Within a short time, its AWS application had grown to over 100 TB. Because the company ran the Windows version of ArcGIS it needed file storage attached to a Windows server instead of Amazon’s object storage-based S3 offering. Amazon’s other storage product, Elastic Block Store (EBS), fit the requirements of being Windows compatible, but its 1TB volume limit and the single EC2 instance restriction meant it had to utilize software RAID on the EC2 server to connect the 1TB volumes with the larger data sets. This was costly from a time perspective and did not provide the elastic nature the company was seeking from the cloud.
To get around EBS restrictions, the geodata player briefly tried an open source-based storage approach, which only added to management complexity and required additional capacity and AWS instances – hence additional costs. In exploring options, the company also learned that using products like Gluster with EBS storage didn’t avoid the limitation of single EC2 access either.
In September 2013 the geodata player also began using the same Zadara Storage enterprise Storage as a Service (STaaS) offering as did Swisstopo, also at the AWS US East data center. In this way the company was able to design an architecture that effectively attached one large storage device directly to multiple Windows machines and had it behave just like a standard NAS device that ArcGIS expects.
The company’s architecture shares storage across three servers at AWS US East, using over 100 TB of RAID 10 storage on over 70 3TB disk drives, with an 8 vCore controller providing a 32 GB cache. It also uses an additional 20 TB of EBS storage for the native ArcGIS application and some Oracle databases, and separately archives about 100 TB of infrequently updated files in Amazon S3 as a backup.
With this approach, the company had the best of all worlds –highly reliable enterprise-grade storage with features not available from AWS, the full geographic data management power of ArcGIS, and a storage architecture that far more flexible and scalable. From start to finish, deploying with the enterprise-grade storage as a service solution took only a matter of weeks. The ability to share storage volumes and the fact storage volumes do not have a size limit made it easy to grow the AWS application literally in minutes. The company’s ArcGIS administrator simply logs in to the firm’s own management portal – which, much like the underlying resources, is not shared with other users – and adjusts the amount of the type of underlying storage (disks of different sizes/types and SSDs) or controllers.
Because it is taking advantage of an enterprise Storage as a Service (STaaS) offering, the company can treat storage as a pay-as-you-grow resource, paying only for the storage it uses, by the hour, without long term commitments, and allocating it as Operating Expense (OpEX) – in other words, avoiding the capital costs of storage hardware purchases.
This STaaS approach at AWS also enables GIS related companies to build compute architectures that eliminate any potential single points of failure, since volumes are shared and support the standard file access protocols used by Linux and Windows systems, NFS and CIFS respectively, for global replication for disaster recovery and multi-site collaboration use cases.
And, because the deployment was painless, the geodata player could stay focused on its core mission of serving smart geospatial information to clients, and spend less time addressing the minutiae of maintaining an enterprise cloud solution.
Most GIS applications have been sidelined from taking advantage of cloud economics and scale because they need robust enterprise storage features – like NFS and large volume sizes – that weren’t available until now. This has changed, fortunately. Surveying businesses and GIS users of all kinds need to take a fresh look at whether their computing resources really need to remain on premises – or if the cloud can be made to provide an easier, lower cost and more scalable approach.
Noam Shendar is VP of Business Development at Zadara Storage, a provider of enterprise Storage as a Service (STaaS) solutions. He has over 15 years of experience with enterprise technologies including at LSI Corporation, MIPS Technologies, entertainment technology startup iBlast, and Intel. He holds a B.Sc. (with Honors) in Electrical Engineering from the University of Wisconsin-Madison and an Executive MBA from Santa Clara University.