This is the first of two pieces. Peppered with personal experience and broader reflections, it sets out the context for and state of open data. In the following companion piece, I will set out some fresh principles for open data, as well as supporting action areas. In them I hope there are doses of both common sense for now and foresight for tomorrow. The plan is to keep open data relevant and impactful in an ever more complex world. I’d welcome views on both posts.


A little over a decade ago I started work at the Greater London Authority. My arrival coincided with the London Datastore launch. Its simple, effective model was to make data open; don’t let imperfections stop you; use it to encourage transparency and innovation. In 2010 we were reeling from the financial crisis, but the world felt like a more certain place — by a considerable stretch. And open data was there to help.
For me, the first ten years of open data was a quiet revolution. Open transport data aside, this period was most memorable for the broader shift in data and digital culture. In 2010, data was for backroom analysis only. Today it is the prime mover in innovation, at the core of all digital transformation and ecosystem development (see the growth of open banking and fintech movement). Open data did much to get us here.
More recently, I led a group of Chief Data Officers from across the globe. For the G20 Smart Cities Alliance, we wrote a model open data policy for use by cities around the world. Produced in these pandemic times, this work prompted me to think about what the now-changed trajectory of the coming years means for open data. Also, there is the wider techno-social context — how should government data leaders navigate the messiness of the ‘big data’ era, the proper arrival of AI and the ethical considerations that accompany both?

Technology shifts, decentralised disruption, the pandemic

Back in 2010, Artificial Intelligence (AI) and the Internet of Things (IoT)simply were simply not talked about in city government. Only now do they start to make modest dents in decision making and delivery across government. The automation of jobs may have featured in a few policy papers, but accompanying predictions comforted the reader that any significant hollowing out of the labour market was some way off.

Today, technology, dataflows, and ready access to computational power combine to outpace the democratic process, governments and markets. The Reddit Gamestop ‘short squeeze’ shows how — with, yes, open access to data once the preserve of ‘markets’ — activist investors can bring hedge fund titans to their knees. Regulators warn investors of the potential risk of buying unregulated crypto currencies. Last year, AI moved out of the abstract and into the political spotlight in the U.K. when millions of pupils received qualification grades by algorithm. People took to the streets in protest. Experts picked apart the code in the open. Politicians wore red faces.

And then of course, there is the pandemic. Covid-19 dramatically accelerated digitisation of government services (most obviously in health and social care). We — well, the ‘lucky’ ones not among the digitally excluded — have been pitched into digitised, stay-at-home working and shopping habits, the structural economic effect of which may be as great as that caused by the switch from a manufacturing to service economy. Data collection and sharing increased dramatically in this period, raising privacy and public trust issues which now need to be addressed.

Open data in pandemic response strategies

Johns Hopkins University has emerged as the global posterchild of curated, data-led intelligence. Its dashboard has 1 billion page views and the detail at US State level and accessibility in features like the daily, 60 second ‘Data in Motion’ video, is hugely impressive. Impressive for its openness, Italy has been using Github to publish detailed Covid data (e.g. hospital admissions) since March 2020. UK authorities like Hackney and Camden councils moved quickly to work in the open, integrating non-profit sector data to create sophisticated community risk profiles.

The pandemic has triggered in smart political leadership an appreciation of the power of data. And good. Because leaders from local to global scale have a series of strategic and urgent questions to answer.

We need to be asking how open data can be deployed to:

  • manage ongoing population-wide vaccination.
  • respond to often hidden pandemic impacts (e.g. social isolation and mental health).
  • re-orient education and training quickly around rising unemployment and disappearing industry segments.
  • model major re-distributive socio-economic experiments like universal basic income.
  • measure the effect of the pandemic on Sustainable Development Goals.

In the city itself, efforts to encourage safe use of urban spaces — a ‘Waze on foot’, applicable to offices, shops, community facilities, schools — can be driven by geographically granular and time sensitive data sources. These same data can support the longer-term scenario planning required to breathe life back into commercial and retail zones. See London’s High Street Data Partnership for how open data is being used alongside that of Google, Mastercard, and O2.

In a huge shift from a decade ago, tech giants are opening swathes of (de-identified) data and tooling. Facebook’s Data for Good features pandemic management use cases, based on movement and social connection at scale. Such examples have local applications. All of the above are proof of the high value in — and necessary future focus on — combining open data with sensitive, authenticated personal data, novel crowdsourced or private sector data.

Future open data needs to take account of digital identity, integration and forming exchange markets

Digital identity offers consumers control over their data. By doing this it can build the trust needed to make digital markets function. The pandemic has strengthened the case for it, as can be seen in the UK Government’s reprieve and scaling up of Verify, as well as in a range of sensitive global-scale pandemic management use cases (e.g. vaccine passports). Robust identity management should — if we pay attention — enable much needed integration and organisation of (citizen-level) data across departments, and subsequent opportunities for analytical and service-level value creation. Initiatives to put people back in control of their data, such as Sir Tim Berners-Lee’s Solid, will also bring attention to citizen (control over) data sharing.

A network of data markets — based on transparency and trust — is starting to emerge, taking data exchange outside of industry value chains. While scale remains in the future, examples like Amdex, which describes itself as an open data market, shows how organisations can make open data accessible, alongside permissions based and paid-for exchange models.

Tackling data sovereignty or legal ownership issues, as they apply to downstream uses of data will lift data volumes, of which open data will be a subset. Further, cracking ‘data valorisation’ — in plain English, putting a monetary value on activities and assets used along the data production line — will unlock use cases and business models. A topic we touched on in the G20 model open data policy, as urban data gets bigger and more unstructured through our old friends AI and IoT, costs associated with pre-processing, building of datasets and the infrastructure to enable sharing all increase. Contentious to some, charging to cover costs for these activities, especially if they are attached to high value use cases, may become necessary for government to open up high value data at all.

The advances across technology (platforms and exchange protocols), governance and general scaling of data markets, asks for a reassessment by government of the role that ‘centralised’ data stores play in decentralised data architectures. The transparency that open data brings needs to be preserved, of course, but the costs and realisable benefits of building urban platforms at the heart of the smart city will only stack up in the largest cities. Leaner open data platforms, exposing government open data on other platforms, or deeper efforts to catalogue data to aid its discovery (e.g. using the DCAT2 standard) are solutions better suited to our times.

To conclude… for now…

At the start of the second open data decade, context is everything — our world is undeniably changed and more complex. Where (open) data has been well organised, it has shown itself to be a powerful tool in responding to the pandemic’s wide-ranging effects. Arguably though, a more intertwined, sophisticated, and diverse data sharing ecosystem makes it harder for government open data to stay relevant.

However, I would argue that open data remains the bedrock of a healthy data ecosystem. Far more than a matter of culture, open data needs to move quickly with the times. This means:

  • Being more demand-led and focused on value-driven publishing, and working ever more in the open to achieve this will matter more.
  • Addressing workforce capabilities in some unexpected and unglamourous areas (think data engineer and not scientist to run the hard yards on much needed data integration).
  • Most importantly perhaps, open data management processes must be underpinned by a more sophisticated understanding of the relationships between emerging technologies, data and changing societal norms. We need to consider how open data can support open algorithms, and how that data is labelled — as volunteered by individuals, observed or collected through processes, made synthetic or derived through computational procedures — so that AI systems know how to use data properly and we can effectively judge that this is the case.

In my next post, I will go further into the principles and action areas that will make open data work well in modern data markets and societies.