Dark Data

May 2013

IBM Mainframe Computer

We live in an era in which seemingly every piece of software we use issues us an injunction to share. Share your photos. Share your nutritional intake. Share your exercise metadata. Share your whereabouts. Share your high scores. Share your purchases. Share your playlists.

This injunction to share reminds me of Žižek’s observation on capitalism’s injunction to enjoy, namely that despite appearing otherwise, it‘s designed to avoid us experiencing excessive jouissance, in order that we may achieve a homeostatic balance. If the demand is too effective on its own terms, it threatens to burst into destructive hedonism, and that’s no good for productivity. This typically late capitalist request functions effectively by luring the subject into a form of self-regulation.

Just as capitalism demands that we have a good time, digital networks demand that we broadcast our data selves on a never-ending treadmill of affective labour. Unlike capitalism and enjoyment, this ‘social turn’ in software is not designed to rebalance our sense of the private through disclosure — a digital show-and-tell demarcating and reinforcing the public. It is designed rather to elicit the performative impulses at the heart of social life, and from these generate attention as an exploitable resource.

Attention as a resource is the basis for much of the so-called digital economy. It can be generated as long as cognitive surplus exists in the world. Once subjected to this decree to share, we find it difficult to self-regulate, precisely because our brains have such a surprising amount of surplus cognition available to them at any given time. As humans, we are primed for performative social activity, so the continuous cues to perform online are obediently heeded. Both the private and public domains are casualties of this decree, collateral damage for an entire industry focused on intensifying attention on performative data.

The industry seems quite a way from reaching peak attention. I’m yet to see a projection of such a saturation of cognition based on neuroscience. The resource continues to expand, both geographically and in volume. Wearable technology, for example, is primed to produce new levels of productive eyeballs, while the expansion of digital BRIC economies presents another leap in resource.

That this affective labour is performed entirely within an attention economy is not news. What is new is the proliferation of behavioural data it produces, collected through mechanisms that become indistinguishable from advanced forms of surveillance. This is part of a wider trend — an underworld of what I would call dark data is emerging as an increasingly influential actor in technology.

Dark data is data we know exists but is never made visible to us as citizens and consumers. Like dark matter, we see only its effects. Instantaneous stock market crashes, precision drone strikes, personalised advertising, denials of credit — all can be seen as mediators of dark data. Occasionally it leaks directly into public discourse through privacy transgressions, cast as an abused party. This rendering is illusory: Dark data is not just ‘out there’, like all data it is manufactured, be it by physical sensors or software. It requires the deployment of an extensive infrastructure controlled by organisations with the means to create it.

Examples of dark data: browsing histories collected by cross-site marketing firms and search companies. Our credit histories. Retail purchase histories. High frequency trading logs. State-sponsored intelligence data. The raw material for Facebook’s Edgerank algorithm. CCTV. Our IP addresses and access logs. Our phone communications, from calls to SMS. Our GPS tracks. Photographs containing our face. Biometric fingerprints. Eventually our dietary intake and DNA. Not to mention our medical records.

The discourse on dark data exposes what Bruno Latour has identified as the paradoxa that is the modern constitution. Further evidence, if any were needed, that we are still not modern. On the one hand, we have outgrown the naivety implied by the belief that data might belong to a transcendent Nature, a thing-in-itself which we happen to stumble upon. CEOs aside — they have dubbed dark data that which is not analysed or acted upon by enterprise, but in a typical feat of cognitive dissonance haven’t owned up to why it’s there in the first place — we are fully aware of its social dimension, its material status. It suffices for sociologists like Manuel Castells to remind us of this fact. On the other hand many social actors, from Stewart Brand to the Pirate Party or the Open Data movement, ascribe data its own agency. Employing a rhetoric of liberation, they assert its desire to be freed, as if following a pre-determined destiny of its own.

We can’t have it both ways, adhering to a doctrine both of transcendence (‘information wants to be free’), and immanent social constructivism (‘data is manufactured and thus political’). This discursive stalemate bears all the hallmarks of the modern.

The discourse on data needs to shift into the realm of hybrid networks, to uncover and clarify the effects of these dark actors at the heart of modern life. Yes, data can ally with other non-human actors — such as TCP, the internet’s transmission protocol — to exert its own forces on so-called social systems, just as data is intricately entangled with human, technocratic power structures that ally to generate it. Precisely because all this is true, neither pointing at the ‘social’ nor making techno-determinist claims about data’s desire will further the debate.

We must accept the entanglement of human and non-human entities, the mutuality of their relations, and discard the primacy of the subject-object relation. Only once we recognise dark data’s position in this vast network of interrelated objects can we begin to understand it and its affinities. An empirical analysis of its mediators (from Edgerank to HFT algorithms), its allied infrastructure (from data centers to fibre optics), its spokesmen and women (from Wikileaks to Mark Zuckerberg), the organisations involved in its creation (from Google to Transport for London), and the technologies required for its analysis (from Hadoop to Cassandra), will allow us to trace dark data and its influence.

In many ways, dark data and its surrounding infrastructure is the sewerage system of the 21st century. Out of sight and out of mind, it increasingly provides vital services to the modern world. It is undoubtedly the modern repository of unconscious desire. It contains all the unspeakable acts we cannot articulate as a society. Like any intrusion of the Real, its leakage into the everyday is toxic and hazardous.

As with sewage treatment, dark data is the offspring of a technocratic world. It continues to proliferate exponentially, the dark goo oozing from the cracks in contemporary life. Once we begin scrutinising it, we may be surprised at just how far its networks of allies extend.