Silo-Busting Week, Day 3: Big Data? How About No Data?


It’s silo-busting week on the Frontier Group blog. On Tuesday, we set the stage by discussing what issue silos are why they’re problematic. Yesterday, we looked at an example of how two seemingly unrelated issues share common contours. Today, we address the need for good data as the nation deals with rapid change on many fronts.

A strange and unprecedented thing has been happening lately at Frontier Group. On not one but two occasions recently, we’ve been forced to delay reports because a data source that had once been available to us suddenly disappeared.

A good example of disappearing data is the decision by FracFocus – the industry-run disclosure database for fracking chemical use – to severely limit the amount of data that can be downloaded from the database each day. FracFocus has never made it easy to access its data; it took some creative work by some smart and committed people at SkyTruth to aggregate the FracFocus data into a format that could be used by researchers to explore the cumulative impacts of fracking, a topic we investigated in our 2013 report, Fracking by the Numbers.

What’s particularly aggravating about FracFocus, though, is that many states now require companies using fracking to report their data to the service in lieu of direct disclosure to state regulators. In essence, this has left a private, industry-funded organization in charge of deciding who gets to see what data about fracking in what formats and under what circumstances.

In recent years, we’ve documented the rise of state-level government spending transparency websites that give the public more, better and more timely information than ever before about how government is spending our tax dollars. Yet, at the same time, and on many of the most important issues we now face, we know less about what is happening than we did a few years ago.

How can this be? Don’t we live in an age when the NSA knows the pet name you call your sweetheart and Apple knows where you stopped for coffee on the way to work?

Why is it that it required a San Francisco design firm to hire bicycle messengers to follow “Google buses” around the Bay Area in order to produce the first real data visualization of how commuter shuttle buses are affecting transportation there? Why is there no geographically detailed, nationally consistent source of data on solar energy installations on rooftops and businesses – or even any reliable, 100% public, state-specific data on distributed solar energy? How is it possible that some state oil and gas regulators – as we learned in our conversations with them last year – have no idea how many wells in their states have been fracked?

On so many issues, the public sector simply fails to collect the data needed to evaluate or understand emerging technologies and practices with huge public policy implications. Sometimes, this is a result of special interest lobbying – see, for example, the exemption of oil and gas production from reporting under the federal Toxics Release Inventory. Sometimes, it is a result of data collection processes that were once adequate now failing to do the job. In our work on transportation, for example, it has struck me as absurd that there will be six years between iterations of the National Household Travel Survey -- a time span that has seen dramatic changes in the economy, demographics and technology.

And sometimes, something new emerges for which there is no precedent for data collection at all – see, for example, the rapid expansion of Uber and Lyft, carsharing and other new transportation services in our cities. Clearly, in at least some large cities, these new modes are already affecting travel choices. But we have little idea of what those impacts are, and without better data, it will be impossible to incorporate those new modes into our planning for the future.

There are some exceptions. In New York City and Washington, D.C., data on the use of bikeshare systems is provided in accessible formats in nearly real time. Capital Bikeshare supplements those data with regular surveys of its members. As a result, despite the relative novelty of both services, we have an emerging understanding of how bikesharing is used, by whom, for what purposes, and what impacts that use has on the transportation system as a whole.

For public interest advocates of all stripes, across all issue areas, the goal should be for timely, accurate data on pressing public issues to be accessible to the public in understandable formats and be downloadable for off-line analysis. Imagine, for example, the power of daily, automated reports on the amount of water pollution flowing from a sewage treatment plant, the number of cars on a busy highway, or the number of fracking wells drilled in a particular area.

There may be limits to the practical ability of public or private sector actors to collect, manage and communicate these data, but as technology continues its ever-onward march, those barriers are falling by the day. It is up to advocates and policy-makers to insist – even, in some cases, as a condition for doing business in a community or obtaining a permit for a regulated activity – that the public be provided with access to the data needed to assess the societal impacts of new technologies and practices.

It is time for public interest advocates to articulate and fight for a new conception of “right to know” – one that is more expansive, more timely, and provides the public with as much information about what industries are doing on a daily basis as industries now have about what we are doing.