The challenge for the U.S. Census Bureau is straightforward. It may not be simple, but it is straightforward. Whether the Census Bureau is collecting data on a household or a business, the basic collection unit is the individual address. For a business, the census refers to that individual address as an establishment.
“Collecting data in the right place is critical,” says Andrew W. Hait, survey statistician in the Economics Statistics Division of the U.S. Census Bureau. “You can’t publish state, county, ZIP code or city-level data unless you get that business in the right physical location.”
Businesses can be tiny (a food truck) or a 1,000-acre refinery operation, Hait continues. “When we geolocate every one of those establishments, identifying the physical address is the first step and that sometimes is a challenge. Businesses often report their mailing address, but we want to count where that business is physically located, not where they get their mail.”
Large industrial and commercial operations offer some particular challenges because they may include multiple locations on the site. This can be further complicated when the site straddles geographic boundaries. “That happens with alarming regularity,” says Hait. One way the Census Bureau deals with this is to assign that one address to multiple “establishments.” A corporate headquarters that is at the same address as a manufacturing plant essentially becomes two establishments. “You wouldn’t want to count those employees working at the corporate headquarters as if they were working at the manufacturing plant because your manufacturing productivity data would be affected,” Hait points out.
All of this complexity even affects the level of geography the Census Bureau can publish. On top of that, there are privacy issues laid out in Title 13 of the U.S. Code, which is the governing document for the bureau. The privacy rules prohibit them from identifying individual companies. Hait offers an example: “If you and I own the only two gas stations in our town, Census could not publish data for gas stations in our town because you could easily subtract your employment and your payroll and your sales from the total and know exactly how much I paid my employees, and how many employees I have and what my sales are.”
That’s on the smaller side. When businesses are physically large, it also affects the ability to publish data at, say, the congressional district level when a business straddles two congressional districts, Hait continues. This also goes back to the earlier discussion of multiple enterprises on the same large property. Corporate headquarters may be in one district, and the manufacturing operations may be in an adjoining district. The same can apply for local taxes. An employee may enter through a gate facing onto one street that is in one tax district and proceed to a building that lies across a boundary in another tax district.
The data that the Census Bureau publishes comes from a variety of sources, Hait explains. The bureau does its own collections of data and sends out forms to businesses (this is increasingly being done electronically). It also uses administrative data, which allows it to gather data that avoids asking the business to fill out information they have already provided to someone else. The two principal sources are the Internal Revenue Service and the Social Security Administration. This helps, along with some special edits, when it comes to identifying and locating home-based businesses, independent contractors and sole proprietorships for purposes of the census. That business is counted where the work gets done, not where it gets its mail.
There are three ways the Census Bureau looks at employment data, says Hait. One is through demographics, or the household data where they ask how many people are in the household and if they are employed. The demographic data only counts the worker once in their primary occupation (the one in which they earn the most money). The business data counts that worker in as many industries as they work. So, someone with three jobs gets counted once in the demographic data for their highest earning job and then, in the business data, once in each of the industries they work in. The third way employment is counted is through a program called Local Employment Dynamics that matches the two situations and makes the connection between where the worker lives and where they work. This can help provide interesting data such as commuting flows and employees who are teleworking.
“Geocoding all of that is the fun part,” Hait says, with a nervous chuckle. “One thing we do in our geocoding operations is, we used to ask the business to report not only their street address, but also the name of the city and county they are physically located in. We stopped doing it because we found that people don’t often know what city or county they work in because of those straddle situations or for other reasons. The post office recognizes multiple names for geographies. I can send mail to an address in Georgetown, Washington, D.C., or to that same address as Washington, D.C., and the post office recognizes Georgetown as an alternate title for that particular neighborhood of Washington, D.C.”
The Census Bureau uses the Master Address File, which has every known address of houses and businesses in the United States and the geocode associated with every one of those addresses, Hait continues. The Census maintains its own massive database using a master address file called MAF/Tiger, which it constantly updates through its own survey programs and annual reviews of street addresses with local communities. The latter program is called LUCA, or the local update of census addresses.
“For the upcoming decennial population census, we’re doing a lot of imaging work,” Hait explains. They will look at past satellite imagery of an area from a few years ago and compare that to the recent imagery to identify areas where they may need to identify and geolocate new addresses that have been created. Previously, that was all done manually by sending Census Bureau employees into the field. More of that work is being done at the Census headquarters using image comparisons to reduce the manual work.
“I work on the business side,” says Hait. “It’s a lot easier counting 8 million employer businesses than 360 million people.”
Hait discusses the seriousness of privacy and how data is treated at the Census Bureau. He observes that they appear to take privacy much more seriously than what public expectations are, at least when it comes to business data. “I can go into various search engines and say, ‘Show me all of the restaurants within 20 miles of my house,’” he says. “It will show points on a map, and those points will have a name of a restaurant and the street address, and, increasingly, it will even include some data.” People have become accustomed to seeing the point-level data products that identify individual businesses by name, and then they come to the Census Bureau and see they are more conservative with the summary-level data they provide. “We’ll tell you there are 27 restaurants in Crofton, Md., but we won’t show points on a map, and we won’t show you the names and addresses of those businesses because those businesses don’t want to have their privacy violated.”
Privacy has an impact on the quality of data and, Hait notes, the quality of the data the Census Bureau gets from businesses is improved and substantially affected by these privacy rules. “Businesses are much more open to give us their real numbers because they know that we’re going to protect their privacy when we tabulate the data.” He returns to the example of the only two gas stations in town being owned by two different companies. The Bureau would indicate there are two gas stations in the town, but it wouldn’t show any data. That would also be true if there were 30 gas stations and 20 were owned by one company and 10 by the other. Because it would still be possible to extrapolate the data of one from the data of the other, the Census Bureau would not publish that sensitive data.
Providing Data Tools
Clearly, the job of collecting and reporting data for government purposes is a massive undertaking. It’s not a one-way street. “There are reference resources,” Hait says, “in the form of shape files, layers, geographic information that people would want to be able to overlay over our data. Census Business Builder lets them do that.” He offers the example of North Carolina, which has zoning data for every census tract in the state. North Carolina wants to overlay its data on top of the census demographic and business data both for economic development purposes, but also to reevaluate zoning. “Census Business Builder now lets them overlay their own reference maps, their own shape files, their own map services, and we’re doing that more and more because Census is not the only provider of statistics. There are 17 federal agencies alone, so if I want to pull in data from the Department of Interior or NOAA weather maps and overlay that on top of the census data, we have an obligation to provide that functionality.”
Hait also encourages businesses to use Census Business Builder in their own marketing and strategic planning. The tools can help a business identify how many similar companies are in an area, giving some sense of how much competition could be faced. It can also provide data on how many businesses fit a customer or target profile in an area. Other tools that could be important to developing surveying opportunities could include the number of building permits issued.
The Census Business Builder tools are available at: https://www.census.gov/data/data-tools/cbb.html