It is often said that the digital economy runs on data. Digital technologies have transformed the economy and society – and data is at the centre of this transformation. The COVID-19 crisis reminds us of this by confronting us with a steady stream of dreadful data: confirmed cases, deaths, patients hospitalised, in ICU and fully recovered. Despite the difficulty of compiling reliable numbers, these daily statistics have helped government actors and public health experts in taking actions to “flatten the curve” and, equally important, in justifying these actions to their citizens. While the human and economic devastation from this pandemic will undoubtedly be vast, a rational approach, based on data, analysis and expertise, has proven to be effective in slowing the viral spread.
Underneath the case statistics, there is another vast and growing trove of data which has been collected, aggregated and shared by the scientific research community since the early days of the outbreak: data about the virus itself (SARS-CoV-2) and the severe respiratory disease it causes (COVID-19). While public life and non-essential sectors of the economy were shut down, biomedical and clinical research laboratories around the world focused on a single topic: the coronavirus. As the New York Times wrote: “Nearly all other research has ground to a halt.”1 What we are witnessing may very well be the single largest research effort ever undertaken in the history of science. True to the established tradition of science, it is done with complete transparency and open data. The results are impressive. In only six months, researchers have identified the likely origin of the virus, mapped many of its structural and functional properties, produced reliable diagnostic tests, improved treatment options and built a pipeline of 150 candidate vaccines.
Why data is not yet shared widely in the corporate world
Those remarkable achievements in the scientific world should give us pause for thought. Why has biomedical research on COVID-19 made such rapid progress in only a few months, while businesses struggle to adapt and respond to the new reality? Could data sharing have something to do with it? We are seeing some indications for this in recent announcements by big technology firms. Microsoft launched an “Open Data” campaign.2 Noting that “fewer than 100 companies now collect more than 50% of the data generated by online interactions”, Microsoft is making aggregated data of confirmed, fatal and recovered COVID-19 cases from all regions available to academia and research. The CEO of Facebook wrote about “a new superpower: the ability to gather and share data for good”.3 Apple and Google partnered to launch application programming interfaces and a Bluetooth-based platform for contact tracing.4 Given the track record of big technology and platform companies in collecting, hoarding and monopolising data for their own benefit, these pronouncements and plans have attracted criticism from privacy advocates, scrutiny from regulators and scepticism from many other observers.
Apart from thin slivers of digital platform ecosystems, data is not shared widely in the private sector. This has been noted for quite some time by academics and policymakers. Two recent reports by the OECD and the European Commission shed some light on the reasons why data is not shared widely. The OECD report “Enhancing Access to and Sharing of Data”5 emphasises that “access to data is crucial for competition and innovation in the digital economy”. It then proceeds to identify three significant barriers to data sharing: (1) privacy and IP rights and other legitimate commercial and non-commercial interests; (2) lack of trust among organisations, sectors and countries; (3) misaligned incentives, and limitations of current business models and markets. The “European Strategy for Data”6, published by the European Commission in February 2020, similarly states that “the value of data lies not just in its use but also in its reuse through sharing. Currently there is not enough data available for innovative reuse, including for the development of artificial intelligence.” The report provides several reasons why data sharing between companies has not taken off at sufficient scale: lack of economic incentives, lack of trust, imbalances in negotiating power, fear of misappropriation of the data by third parties and lack of legal clarity on who can do what with the data.
The findings are clear and unambiguous: data is not shared widely between companies – or even within companies. Closely guarded departmental data silos, poor data quality, abandoned data lake projects and weak data governance are unfortunately all too common in businesses. Which brings us back to our original question. What can we as executives, managers and advisors learn from science during this crisis? Why is data flowing freely in coronavirus-related scientific research but not in the digital economy? To answer these questions, let us look at two prominent examples of data and information sharing in COVID-19 science: genomic sequences and journal articles.
Examples of how data sharing works in science
Genomic sequences. GenBank is the largest and most widely used genetic database, containing 216 million sequences with more than 400 billion base pairs (building blocks of the DNA double helix).7 On 12 January 2020, researchers submitted the first genetic sequence of what they tentatively called “Wuhan seafood market pneumonia” to GenBank.8 Many other nucleotide sequences followed. The rapid sequencing and global sharing of the virus genome in the first weeks of January allowed the development of diagnostic tests and the commencement of research into vaccines and therapeutics. On 16 March, less than 10 weeks after the first genetic sequences were released, biotech company Moderna injected a dose of an experimental RNA vaccine into a human volunteer participating in an early-stage (phase I) clinical trial.9 As of 29 June, the WHO reported that 17 candidate vaccines were in clinical trials and 132 were in preclinical evaluation.10
The University of Oxford is currently conducting a clinical trial with 1,100 participants11, while at the same time partnering with AstraZeneca to further develop, manufacture and distribute the vaccine.12 The Serum Institute of India, the world’s largest vaccine producer by number of doses, announced that it would start producing the Oxford experimental vaccine without waiting for the clinical trials to complete.13 Even at pandemic speed, however, most experts agree that getting a licensed vaccine to the public will take at least 12-18 months. Side effects, dosing issues and manufacturing problems can all cause delays.14
Journal articles. On 13 March, chief science advisors from a dozen countries called on publishers to voluntarily agree to make their coronavirus-related publications, and the available data supporting them, immediately accessible. In response to this call, close to 50 publishers (including commercial publishing houses like Elsevier, Springer and SAGE) have made their coronavirus-related articles discoverable and freely accessible. One widely used open-access platform now lists more than 26,000 full-text journal articles15 relating to COVID-19. In addition, dozens of preliminary reports are posted every day on biomedical preprint servers.
So much is published, in fact, that researchers and clinicians find it increasingly difficult to keep up, extract relevant information from the literature and organise it into knowledge. Some are turning to artificial intelligence (AI), particularly information retrieval and natural language processing, to generate new insights. A well-known example is the COVID-19 Open Research Dataset (CORD-19), a machine-readable and freely available resource of currently over 59,000 scholarly articles on COVID-19 and related historical research.16 With the help of CORD-19, AI experts worldwide have developed text and data mining tools to help answer important scientific questions on transmission, incubation, risk factors, vaccines and therapeutics. Through Kaggle, a machine learning and data science community owned by Google Cloud, these tools will be openly available for researchers around the world.
How data sharing could work in the corporate world
Based on these and other examples, there should be no question that collective sharing of data and information through digital networks has greatly advanced the search for a treatment and cure to COVID-19. While global supply chain networks were disrupted, scientific data and information continued to flow freely. How then can data sharing be made to work in a corporate environment? Managers and executives need to recognise two simple principles: (1) data can and should be valued; and (2) data sharing adds value. Let us look at each in more detail.
1. Data can and should be valued. Getting a rough sense of the value of your data assets and those offered by others is the first step if you want to participate in the emerging data economy. Data can indeed be valued, but it is not straightforward. Markets for data usually do not have enough buyers and sellers to converge on a price that reflects the true economic value of data. From an accounting perspective, data is treated as an intangible asset and is usually not disclosed on the balance sheet. Its value is only recognised if the firm holding the data is acquired. This happened when Microsoft bought LinkedIn for US$26bn, thereby valuing each active user (and the data generated by the user) at $260. Another way to estimate the value of data is through the income derived from data. For example, Facebook reported net income of $18.5bn for the full year 2019, or about $11 per user.17
2. Data sharing adds value. Interactions and transactions in the digital economy produce an endless stream of big data. Individuals generate data about themselves through various apps and wearable devices, businesses digitalise their value chains and operations, governments build smart cities and digital services. Big data also happens to be an essential input for training AI systems. Deep learning algorithms, which exhibit astounding accuracy in narrowly defined tasks such as image recognition, machine translation and text analytics, must be trained with large amounts of labelled data. Today, only a small number of big technology firms have access to the kind of high-volume and high-velocity data that is required for state-of-the-art machine learning development. As an aside, this unequal distribution of data is the main reason why in recent years some of the most exciting innovations in reinforcement learning, generative adversarial networks and language models have come not from academia but from private labs funded by Google, Baidu, Nvidia and others.
Regulators and policymakers are aware of these imbalances and the high degree of market power resulting from the “data advantage”. To restore competition and support smaller firms that currently have no or limited access to big-data pipelines, the European Commission intends to fund the establishment of common, interoperable data spaces in healthcare, mobility, agriculture, energy, industry and other sectors. The data spaces will offer access to secure data storage, sharing, modelling, computing and other services, and will be governed by a regulatory framework and common standards. A similar initiative, called the “Data Collaboratives Programme”, was launched last year by the Infocomm Media Development Authority (IMDA) in Singapore. It comes with a governance framework and a regulatory sandbox to explore innovative use and reuse of data.19 The ultimate objective of these initiatives by the EU, Singapore and others is to “open the data silos” and encourage innovation through the free flow of data. Managers and executives can use these regulatory initiatives as a starting point from which to lay their own data pipelines and unlock value from data in a way that is trusted by all stakeholders.
The scientific method has been enormously successful in helping to gradually form and refine our understanding of the natural world. Yet it has been slow to enter the world of business. “Scientific management”, a school of thought founded by Frederick Winslow Taylor in the late 19th century, advocated division of labour and time study in manufacturing. It continued to have considerable influence on industrial engineering, quality management and lean manufacturing throughout the 20th century. In the digital economy of the 21st century, the rigidity of Taylorism seems out of date. The scientific method, on the other hand, is more relevant than ever to the pursuit of knowledge. Like scientists who confirm hypotheses with experimental data, managers transform raw data through analytics, visualisation and simulation into knowledge on which to base rational decisions. The COVID-19 pandemic allows us to compare, in almost real time, the effectiveness and societal impact of different decision-making models. It has become impossible to ignore the fact that governments whose policies and measures are based on data, science and evidence fare better at slowing the viral spread. Business executives and managers should take note. Recognising the value of data, enriching it through sharing, and generating knowledge and actionable insights from it are bound to become essential parts of almost every organisation’s strategy in the digital economy.
About the Author
Andreas Deppeler is an Adjunct Associate Professor at NUS Business School, National University of Singapore. He is also a Research Affiliate at the Centre on AI Technology for Humankind at NUS Business School. He teaches courses on emerging technologies, data value and business strategy. His academic research focuses on ethics and governance of artificial intelligence.