tech, Uncategorized

Data Philosophy: A Day in the Life of a Tech Newsroom

“What makes a Muppet, a Muppet?” wondered our designer. He’d made a prototype illustration, but wasn’t happy with it — he didn’t feel it was immediately recognizable as a Muppet.

He’d come to the tech area for input, and we were happy to oblige. We analyzed the hell out of the visual ontology of Muppetdom. We searched for web images of Muppet schematics. We agreed that the mouth should be open, and maybe there needed to be wrinkles around it it. The head seemed more Muppet-like tilted to the side. I thought there needed to be more obvious fuzziness, while my co-worker thought only the eyebrows needed to be fuzzy. We debated the need to soften harsh angles, and whether the nose should be indicated by a hard crease or a soft planar shape.

The next day, a passing editor joked that they’d heard about our philosophical discussions and wanted in. Easily accomplished, as one software engineer had just said out loud: “What makes a person, a person?” Soon, the office was filled with talk of whether a politician was defined by their office, and what said office meant – was it a building that existed around the politician? Did it exist with no politician in it? Was it a place, a personality feature, or a vocation?

The irony is, I spent ten good years avoiding similar discussions – endless philosophical debates on semantics and meaning and OMG do we have to deconstruct everything to nothingness?! One hint of such discussion and I’d find the people at the other end of the bar, probably shipping Buffy. But there’s a crucial difference in the tech room discussions: While everyone involved enjoys the speculation, it’s usually in service of building something real.

(As real as a virtual thing can be. It’s stored in hard circuitry, anyway.)

In the first case, the designer needed to produce a recognizably Muppet-like illustration. In the second, we had a more difficult task at hand, a database issue where we were trying to decide how many categories we needed, and whether we could use the same one for our site’s users and for the politicians in our directory – all real people, you see, but with different sorts of information needing to be stored. Hence the question of what makes a person a person; yes, the bane of philosophy for millennia, but also the bane of a machine trying to define a person as a series of categories ergo the bane of a programmer trying to define what the machine knows.

Anyway, I understand now why Google has been, at times, hot for people to run out and get philosophy PhDs. To them, grad school must seem like a magical wonderland of people who like to do this stuff! All day! Ad infinitum! Alas, Google’s got it wrong. By all means engage in philosophical sport if you like, but do it while getting paid to design databases, or at least make your money building software first. Also, good luck telling your grad advisor about the need for pragmatic end goals — they tend to look down on that, or at least did when I was still there.

A few weeks later, I saw again how practical some of those “useless” humanities skills could be. We were dealing with a bunch of data, inconveniently and messily produced by actual humans. Because the humans refused to use the exact same word for every description, we had to come up with a way to make the data more computer friendly, viz. by categorizing it. There were maybe 225 entries to deal with. Not even close to being Big Data.

Dealing with a small bunch of messy human data is the same as doing history, as far as I’m concerned. And it’s an obvious (and not even monumental) task for a competent grad student in many fields – history, anthropology, library science, sociology, etc. I said as much to the room;  the room laughed. The programmers were already talking about using Mechanical Turk to crowdsource the categories, because in their view anonymous data input is the only reliable way to let humans categorize stuff. Both I and the reigning editor deemed this a no-go. And even the software engineer had to admit that writing a useful parsing script (which of course they’d started to do) was probably as time-consuming as having people do it.

When I mentioned the grad student solution, I suppose it seemed like joke about how desperate PhDs are, or maybe people thought the task was too complicated to leave to the “soft sciences.” It was frustrating to try to communicate that, in actuality, there are non-science people trained in exactly this. People who could help, and who would enjoy the task. And that any parser we’d write would be just as unscientific because of the small data set and the fact that a fallible human was writing it.

The point being that basic skills such as analysis, categorization, and yes, even being skeptical about categories, are not science skills. They’re not even tech skills, particularly. They’re skills you learn from sifting through data, and thinking about categories and the people who make them.  Like a lot of soft skills , they’re not an easy sell — “Hey, I classify stuff real good!” — but if you can spin it to the specific situation, you might get somewhere. And do not, under any circumstances, tell people that what you’re really doing is humanities.