Don’t laugh, but what you see above is very exciting; it should send a tingle down your spine.
What the hell is it? It’s a dataset from a United States census of public library systems containing data elements that cover library service measures such as the number of uses of electronic resources, the number of Internet terminals available to the general public, reference transactions, interlibrary loans, circulation, library visits, children’s program attendance, and circulation of children’s materials. It also includes information on collection sizes, staffing, operating revenue and expenditures.
And libraries are exciting? No, the data is exciting.
It is an example of the US government releasing raw data to the public in machine-readable format. This allows civil society groups, or even single interested persons, to mine the data to discover significant information about public libraries, the way they are used, their cost-benefit aspects etc.
Releasing raw data to the public in machine-readable format – that’s what is exciting. It empowers citizens; it keeps government transparent, and it harnesses the intelligence of crowds to diagnose public problems and find solutions. This is the outcome of the Open Data Movement, a push to get governments to put as much raw data as it has onto the internet, subject to personal confidentiality and critical national security limitations. The data should be in machine-readable format. Citizens, either singly or in collaborative groups, write applications that mine the data to tease out patterns. The knowledge so gained will be of immeasurable public value.
For example, one might be able to see the effect of road widening (and increased traffic noise) on transactional values of flats within 100 metres — which will enable the public to better judge one of the consequences of vehicle population increase. Or we can track permanent residents’ contribution to Singapore in terms of how many really stay long-term, convert to citizenship, have their sons do national service, etc. That way we can have a more informed debate about immigration.
Readers are perhaps more familiar with the Freedom of Information (FOI) movement. It is a topic I have written about several times, and increasingly others are talking about it. Workers’ Party member of parliament Pritam Singh made a call for a Freedom of Information Act recently. But there is a huge difference between FOI and Open Data.
FOI is request-driven. It enshrines a right of citizens to obtain information from government, with disputes about whether such a request is practical adjudicated by an ombudsman. Agencies required to respond have to be truthful and the answer may be quite voluminous in content.
Open data is not request driven, though public pressure can be instrumental in ensuring that more and more data be made available online. The default position is that all data collected by government (including local government) should be publicly available in machine-readable format, unless personal confidentiality or national security is at risk.
FOI and Open Data do not conflict, but can reinforce each other. While data that is numerical or database in nature suits Open Data, it may need an FOI request to obtain, for example, the technical specifications and decision trail for a government tender. An anti-corruption group for example, may wonder why so many vendors had been disqualified in a tender (a pattern they might have seen via Open Data). How fair or limiting were the tender conditions? How did judging proceed? That kind of information is more suitable for an FOI route.
* * * * *
Open Data is a relatively recent movement, with the US and UK governments currently leading implementation (see data.gov and data.gov.uk). While I myself had heard about it 18 – 24 months ago, it wasn’t until two students at the Singapore Management University presented a paper mid 2011 that I took any real interest. I caught up with them again to find out more, beginning with the question: Why did you choose this topic for your paper?
Randy Lai was interested in data as a problem-solving tool. As he describes it, humans solve problems using both intuition and data. If we’re using intuition alone, this may be biased. Certain fundamental causes or effects may not have occurred to us, but data may reveal the connections. Data is useful in a limitless number of fields, e.g. in public health and medicine, and can bring about forecasts and action.
Priscilla Soh said her interest came from a different direction. “I’m more passionate about the People’s Action Party story,” she said. “Currently, there’s a one-way flow of information. There needs to be a level playing field.”
I asked them for some examples how Open Data might produce interesting insights in Singapore. Priscilla suggested home ownership. “What percentage are held by foreigners? What’s the trend over the years?” she gave by way of examples. “We need to test whether grievances are justified by ground facts.”
I see her point, for dissatisfaction has a political cost, and if it’s unfounded, why do we keep paying this cost merely for lack of data?
Randy had a very different example to offer. He would use data to see what happens to standardised test scores when school curricula are changed. If possible, “I would want to capture a trend between this and future income,” he added.
We also discussed the association between test performance and ethnicity that government statistics implicitly make. Is that really the case? What other social factors are determinants? Which groups really need help?
Here’s a map that illustrates what can be done with data.
Someone took the released data on locations of accidents involving cyclists in London and laid them over a map to show fellow-cyclists where the danger spots are. It firstly leads cyclists to be more careful when on those roads, but it is also a good place to start when civil society and the authorities want to improve road conditions for cyclists’ safety.
* * * * *
Randy and Priscilla pointed me to a study led by Betty Hogge for the Soros Open Society Foundation, published May 2010. The study’s aim was
. . . to identify the strategies used in the US and UK contexts with a view to building a set of criteria to guide the selection of pilot countries, which in turn suggests a template strategy to open government data.
The report finds that in both the US and UK, a three-tiered drive was at play. The three groups of actors who were crucial to the projects’ success were: Civil society, and in particular a small and motivated group of “civic hackers”; An engaged and well-resourced “middle layer” of skilled government bureaucrats; and a top-level mandate, motivated by either an outside force (in the case of the UK) or a refreshed political administration hungry for change (in the US).
To promote the development of digital applications that mine data for public benefit, civic organisations have a huge role to play. In the United States, there is the Sunlight Foundation, for example, which declares on its homepage that they are:
. . . . a non-profit, nonpartisan organization that uses the power of the Internet to catalyze greater government openness and transparency, and provides new tools and resources for media and citizens, alike. We are committed to improving access to government information by making it available online, indeed redefining “public” information as meaning “online,” and by creating new tools and websites to enable individuals and communities to better access that information and put it to use.
Besides raw data, the data.gov website from the US now contains, for many datasets, its own applications, which you can run by clicking on available icons. These will filter the data in the way you want or visualise them in different ways. While you may still need to be clear-headed about what you’re looking for, you don’t have to write computer code to retrieve what you want.
* * * * *
Believe it or not, Singapore has a site: data.gov.sg. Alas, it is a pathetic attempt to mimic what has been done elsewhere. The chief problem is that it links lead back to ministries’ websites and to already-processed data (not raw data). Data is presented in one format and no other. Nor are they machine-readable. But that is exactly what Open Data is NOT about – presenting data in just one way to suit the government’s agenda.
Moreover, on its homepage, Singapore’s site features strongly the statement “Create value by catalysing application development”, which also strikes me as rather off the mark. Of course, “value” can mean many things, but there’s something about that statement that suggests commercial value. Commercial value may indeed be an incidental spin-off of data mining, but it isn’t the primary point of Open Data, an offshoot of Open Society Initiatives. The primary aim is transparency in government and the empowering of civil society for public (not private) benefit.
It does not surprise me that what we have is mimicry without any understanding at all of the substance; wanting (once again) to look like a modern, developed, liberal democratic country without letting go of authoritarian, control-freak habits.
Randy Lai and Priscilla Soh’s paper on Open Data will be a chapter in a forthcoming book to be launched in early November by Singapore Management University and the Wee Kim Wee Centre of SMU. The book’s title is Progress and its (Dis-)Contents. Look out for it.