Data Philosophy: A Day in the Life of a Tech Newsroom

“What makes a Muppet, a Muppet?” wondered our designer. He’d made a prototype illustration, but wasn’t happy with it — he didn’t feel it was immediately recognizable as a Muppet.

He’d come to the tech area for input, and we were happy to oblige. We analyzed the hell out of the visual ontology of Muppetdom. We searched for web images of Muppet schematics. We agreed that the mouth should be open, and maybe there needed to be wrinkles around it it. The head seemed more Muppet-like tilted to the side. I thought there needed to be more obvious fuzziness, while my co-worker thought only the eyebrows needed to be fuzzy. We debated the need to soften harsh angles, and whether the nose should be indicated by a hard crease or a soft planar shape.

The next day, a passing editor joked that they’d heard about our philosophical discussions and wanted in. Easily accomplished, as one software engineer had just said out loud: “What makes a person, a person?” Soon, the office was filled with talk of whether a politician was defined by their office, and what said office meant – was it a building that existed around the politician? Did it exist with no politician in it? Was it a place, a personality feature, or a vocation?

The irony is, I spent ten good years avoiding similar discussions – endless philosophical debates on semantics and meaning and OMG do we have to deconstruct everything to nothingness?! One hint of such discussion and I’d find the people at the other end of the bar, probably shipping Buffy. But there’s a crucial difference in the tech room discussions: While everyone involved enjoys the speculation, it’s usually in service of building something real.

(As real as a virtual thing can be. It’s stored in hard circuitry, anyway.)

In the first case, the designer needed to produce a recognizably Muppet-like illustration. In the second, we had a more difficult task at hand, a database issue where we were trying to decide how many categories we needed, and whether we could use the same one for our site’s users and for the politicians in our directory – all real people, you see, but with different sorts of information needing to be stored. Hence the question of what makes a person a person; yes, the bane of philosophy for millennia, but also the bane of a machine trying to define a person as a series of categories ergo the bane of a programmer trying to define what the machine knows.

Anyway, I understand now why Google has been, at times, hot for people to run out and get philosophy PhDs. To them, grad school must seem like a magical wonderland of people who like to do this stuff! All day! Ad infinitum! Alas, Google’s got it wrong. By all means engage in philosophical sport if you like, but do it while getting paid to design databases, or at least make your money building software first. Also, good luck telling your grad advisor about the need for pragmatic end goals — they tend to look down on that, or at least did when I was still there.

A few weeks later, I saw again how practical some of those “useless” humanities skills could be. We were dealing with a bunch of data, inconveniently and messily produced by actual humans. Because the humans refused to use the exact same word for every description, we had to come up with a way to make the data more computer friendly, viz. by categorizing it. There were maybe 225 entries to deal with. Not even close to being Big Data.

Dealing with a small bunch of messy human data is the same as doing history, as far as I’m concerned. And it’s an obvious (and not even monumental) task for a competent grad student in many fields – history, anthropology, library science, sociology, etc. I said as much to the room;  the room laughed. The programmers were already talking about using Mechanical Turk to crowdsource the categories, because in their view anonymous data input is the only reliable way to let humans categorize stuff. Both I and the reigning editor deemed this a no-go. And even the software engineer had to admit that writing a useful parsing script (which of course they’d started to do) was probably as time-consuming as having people do it.

When I mentioned the grad student solution, I suppose it seemed like joke about how desperate PhDs are, or maybe people thought the task was too complicated to leave to the “soft sciences.” It was frustrating to try to communicate that, in actuality, there are non-science people trained in exactly this. People who could help, and who would enjoy the task. And that any parser we’d write would be just as unscientific because of the small data set and the fact that a fallible human was writing it.

The point being that basic skills such as analysis, categorization, and yes, even being skeptical about categories, are not science skills. They’re not even tech skills, particularly. They’re skills you learn from sifting through data, and thinking about categories and the people who make them.  Like a lot of soft skills , they’re not an easy sell — “Hey, I classify stuff real good!” — but if you can spin it to the specific situation, you might get somewhere. And do not, under any circumstances, tell people that what you’re really doing is humanities.

Training People Digitally, For Real

This weekend I’ll be at NICAR, the conference that last year inspired me to ponder what humanists could learn from journalism. That means I don’t have a lot of time to write. But I wanted to share an screenshot from the NICAR schedule that I think is worth a thousand words:

Screen Shot 2014-02-23 at 8.35.48 AM

This is what a genuine commitment to helping people learn tech stuff looks like. I myself will be attending several of the hands-on classes, because there are data skills I have yet to learn. That’s the thing, there’s always more to learn:


To my mind, this is what the digital liberal arts should be doing. Not worrying about the theory, but getting more people involved in the practice. Not navel-gazing about where we’re going, but making sure that humanists are equipped to enter the larger tech conversation. If haven’t bothered to learn some of the above, you’re not going to be taken seriously in a tech conversation. You can debate the fairness of that but the fact is, if you don’t get in the game, other people will be making the decisions about what digital archiving, publication, and communication looks like.

Just my two cents. And seriously, y’all, if a kid can do it, you can too.


Practical Tips for Getting Out of an Academic Job and Into the Real World

Dr. Karen Kelsky (aka The Professor Is In) asked if I had any step-by-step posts on how I got my current job. That’s its own story, which I will tell at some point, but this is a quick overview of how I got where I am, with links to all the practical posts I could find. For practical tips I’d also highly recommend Post Academic, which is archival but very, very good.

How to Get a Non-Academic Job

1) Start acquiring new skills you think you’ll need, preferably before you leave your current job. Actually, you already have a lot of soft skills and managerial/admin skills, and those are very valuable. To quote an acquaintance who left grad school, “They pay me in actual money and food, can you believe it? Turns out that the skills taken for granted in the academic world are actually highly valued in the workforce. Who knew? ”

I’d still vote for having basic digital publishing skills (like these) when trying to get any decent/interesting job these days, and more advanced stuff if you want to be in the tech industry.

2) Network, network, network. If you’re an introvert, it’s no fun, but there are non-schmoozy ways to got about it — hell, Twitter is a great place to “meet” people — so just set some goals and do it.  It’s also necessary to have a lot of contact with new people so you fully understand the differences in communication and work culture (see 3, below), i.e. what academic tics come off as a little weird.

I wouldn’t have gotten the job I have now if I hadn’t actively curated a network and gotten out there. After seeing the job ad, I showed up at an Online News Association meetup where the Tribune tech team was talking about their work. I made to sure to introduce myself to the people in the tech department. When the interview process got to the point of needing references, it was a real moment of victory to realize I had three non-academic references (two former freelance clients, one startup manager) at the ready — and that I had successfully, deliberately built a entirely non-academic network. Freedom achieved!

3) Look around at how non-academic fields operate, because you’ll need to drop your old jargon and learn some new lingo. Attend non-academic conferences in various fields to check out the lay of the land, or read up if you can’t go in person. I like the HBR blogs, for example, especially when they say helpful things like “hire from the humanities“. And whether or not you agree with Mr. Kristof’s latest, you’ll want to drop your academic writing habits and get feedback from a focus group of “normals”. You may be shocked to hear yourself using phrases like content creators and make the ask but I guarantee it’s no worse than casually using words like discourse.

4) Throw out your CV and deliberately construct a strong non-academic résumé (or portfolio, if applicable). Often, this means working for free at first, honing your new skills on passion projects, or working for trade (I’ve done both). It might mean volunteering (which allows you to test the waters of a new field and network). It might mean taking any crazy, last-minute job in your new field that gets thrown your way (or it might not — if you’re freelancing you’ll have to learn how to fire clients). It might mean having a blog, or guest blogging, or possibly doing a blogathon or or making other media of your own. Giving talks for local organizations was one way I expanded my network, and my current employers were impressed that I was doing cultural stuff out in the community. Again, you never know what could lead to jobs.

Or it may be as simple as learning how to reframe your accomplishments so they sound impressive to non-academics. Whatever you do, don’t constrain yourself to an overly traditional format — making things pretty is important in the outside world, as is making them accessible. An attractive WordPress site can get you far, and even simpler than that are services like Vizify  (now defunct) and, which give you nice visual portfolios without you building anything. And I think LinkedIn is better than a resumé, these days, because you can link to it in a “cover email”, which is fast becoming the format of choice.

5) Realize it’s going to be hard. Depending on how financially stable you were before, it can be very hard. If you’re feeling bummed, read interviews like this one or this one or this one or this one to know that it’s totally possible. Possible, but hard, and there will be a lot of rejections, but really, I don’t think that’s much different from the academic job market. Or grad school. I’m a thorough pessimist, but I’d still say that to a certain extent you can make your own luck. With a few exceptions, I think it’s not worth throwing your resume into those auto-fill nightmares — that’s what the networking spares you from. And if there’s one thing you should be able to do, as a trained thinker, it’s to act strategically. And that’s really what it’s about.

Finding Code Inspiration, Before You Learn To Code

Last week,  I talked about the difficulty of finding a cool code project to care about if you didn’t already know how to code. Because of that, I’m trying to think of things that inspired me even before I knew how to write the code for them.

Maybe I’m not an adequate code evangelist. I cannot in good conscience tell you every site I visit makes me want to code. I often see a cool UI feature I like, for example, but that’s not quite the epiphany I’m looking for here, because appreciating it presumes a fair amount of coding knowledge already.

On occasion, though, I’ve run accross truly inspirational code projects, such as Tom Scott’s Star Wars Weather:


How does this site work? By means of a pretty important code concept: the API, the thing that offers a site’s most important data in a format easily consumed by another website. A lot of weather sites have APIs, for example, and so you can find all sorts of tutorials on how to scrape weather data and display it on your site. Here’s the rub: I don’t care about the weather, let alone its API, and I have no interest in displaying it on my site. So these tutorials always left me cold. (ba-dum-bump!)

But then I saw Star Wars Weather, and I realized that it was doing a hilarious thing with the data I had previously spurned. The site’s default is London, but if you enter a city name and the code fetches the weather data for that city. Then it uses another program to match weather data with a Star Wars planet. You can also use Javascript to toggle between Celsius and Fahrenheit, and turn on geolocation if you want. Code, doing stuff! And if you like that, you might also like

The second inspiration I remembered was more recent: Darius Kazemi‘s “You Must Be” generator.

It’s all the rage to use Markov chains to auto-generate text, especially on Twitter. Markov-generated speech also leaves me cold, probably because it’s only funny on a statistically random basis. But, as it turns out, you can do far cooler things if you structure the language with some sense of grammar. Enter “You Must Be”, which uses the Wordnik dictionary API to grab a word and its definition, then insert them into a classic pickup line, Mad-Libs style.

That’s quite a nice way to create a consistently performing joke structure, which explains why I’m so into the idea. Let’s face it, humor continues to be the thing I care about most in the world, and I now aspire to build my own joke bot someday, probably using some Natural Language Processing, which I also didn’t care about before seeing this example.

In both these cases, I had the definite feeling of “Damn, I wish I’d built that!” followed rapidly by, “Damn, I bet I could!” That’s why I’ve chosen them as the experiences I would wish for anyone who doesn’t have an idea yet, and wants a coding goal.

Some lessons I’ve learned from these moments if inspiration:

1) You don’t have to care about code. Let me repeat that: you DO NOT have to lovvvvvve Javascript or Python or APIs or data.  Don’t ever let anyone make you feel bad if you don’t. It’s fine to love code for what it can do, not what it is.

2) Don’t let anyone else tell you what’s inspirational. Should I be inspired by sites that help underprivileged children? Probably. Am I? No. And I’m not going to lie to myself about it. I’m a smartass. I like jokes. I’m interested in things that help me make more and better jokes.

3) Like all creative acts, code isn’t about doing things from scratch. It’s about combining heretofore uncombined things. The people who built these cool sites didn’t come up with entirely new code pattern, they used a pretty standard technique (reading data from APIs) and found a way to make the results interesting. But that’s really what you need to learn to code: a sense of the results you’d like to get for yourself, and a sense of the tools available.

Putting It On (The) Line

Last week I had an interesting conversation about tech literacy, and specifically about reading versus doing.

Here’s the basic problem, as I see it: until you code, you don’t know the awesome things code can do. Until you know the awesome things code can do, you don’t want to learn to code

Hence a growing number of people with a vague sense that they should “learn to code,” but with no real sense of what they might actually build. Add to this the fact that programmers tend to think and write in small parts, resulting in Google search results that are frustrating as hell for the kind of learners (like me) who need a structural and conceptual overview before proceeding. So, even after you commit to trying, it feels frustrating and senseless and not particularly rewarding.

Still: Do or do not. There is no “read about.”

I think the best you can get from reading is the big-picture overview, or some inspiration. That’s a good start, but it’s only a start. I always think in terms of doing, when I think about how I would teach the non-tech world what it needs to know.

As I’ve learned recently, other people do, too.

Putting Shit Online

This is the only required class I’d teach. In it, you’d make a web page with pictures and put it online. That’s going to become the equivalent of knowing how to use a word processor. It kind of is, already.

And heyo, look, just I was thinking what a massive pain in the ass the class design would be, I noticed that my buddy Dave, another discontent Ph.D., had put a nice syllabus online (scroll to bottom). The beauty of the syllabus is 1) it has a goal, 2) it recognizes the multiple skill sets needed to put a thing online, and 3) it breaks down things down into manageable tasks.

There are a lot of classes that call themselves “Intro to HTML/CSS” or “Intro to Photoshop” or “Intro to WordPress.” This is a huge branding mistake. Nobody in their right mind says, “Yo, I feel like learning HTML today!” They more likely say, “Wow, I’d really like the ability to control my web presence!” or “Wow, I’d really like to build a thing that lives online!”

So, here’s another goal-oriented thing: a Skillshare class on sale for Janaury. It has you make an HTML portfolio. That’s a nice, manageable, concrete thing.

The key is to have a goal you want to accomplish, and execute it. Otherwise you won’t connect what you read with the real world. This is what code class after code class fails to recognize, and in doing so, fails in its mission of educating people who don’t already care about code.

Computer Literacy

Understanding the deep web is very different than putting shit online. To my mind, it’s more like physics: it explains a lot of facts about the world we live in, and everyone should theoretically understand it, but the average Joe/Jane is less likely to use it day-to-day.

Anyway, if you want to know about The Entire Internet In 30 Minutes, there’s  slideshow for that, with bonus kittens.

More practically, I would say that, as much as using the command line feels alien to the majority of users, it really does bring you closer to the how your computer thinks, and gives you a sense of power over your own individual Machine monster that you, like all of us, now depend on.

And here’s where I differ from many computer teachers: I don’t think trying to make the command line “fun” works. I think for a humanist-leaning learner, recognizing the unnaturalness of the situation, as compared to ordinary human communication, is part of the learning curve. Grokking the huge gap between your brain and a machine brain is a actually a form of metacognition, and to my mind it’s a necessary precursor to genuinely understanding programming.

Data/Actual Programming

This, for me, is the sticking point. Basic web publishing involves what I’d call coding but not programming. Programming, to me, is writing executable scripts – the abstract value of which is not obvious to humanists and other skeptics until there’s a clear, internally defined thing they want to accomplish. (Speaking from experience, extrinsic motivation will not work in this situation.)

To that end, I think scrapers (programs that grab stuff from the web) are a good intro to programming, inasmuch as they show you what the hell that for-loop is good for. So do Javascript interactive thingees.  The benefit of scrapers is that they get you into the “how the web works” thing above, as well as the Big Data thing.

When I say I think the only real way to understand Big Data is to write a scraper – or even more simply, just grab some stuff from a webpage. I’m not just whistling Dixie, here. The moment I actually fetched some remote data from a web page, I understood way more about the Big Data picture than I had by reading. The minute I made my first app do the thing I wanted (thanks, Rails Girls!) I levelled up exponentially in my understanding of MVC and databases and querysets, things that had seemed theoretical and (frankly) useless when I’d learned them in isolation.

For a bigger-picture view of how these individual skillsets work together, here’s a nice (if complex) interactive syllabus for learning paths you might take, with links to tutorials.

Put It On (The) Line. Really.

Time for an analogy. I can – and will – tell you all about Roman culture. Or you can use Wikipedia. Whatever. But until you’ve struggled with the evidence yourself, and made a whole from multiple, confused, conflicting bits, you don’t actually understand Roman history beyond knowing some trivial bits.

Maybe that’s why I’m so resistant to “reading about” code. Can you read about nuclear fission and understand it? Conceptually, sure, but I bet it would help to do some equations and stuff.  You can see why “reading about” isn’t the ideal, particularly for people who’ve already demonstrated that they’re smart and determined and trainable.

How do you know what you don’t know? By trying. I hate it when Socrates is right.

I wonder if that’s the real problem. Some people define their entire self-worth by single-subject expertise. Having learned all there is to learn about Medieval hedgehogs or Javascript primitives or what have you, they know full well that taking on another subject would reduce them to not knowing everything again. Perhaps resistance to coding lies in knowing that once you dip your toe in the water, it will open up an ocean of self-doubt.

Personally I think humility is a feature, not a bug, of lifelong learning. No matter how smart you are about that one single thing, there’s an infinite amount more you don’t know. It’s not a bad thing to be reminded of that. And education is not about levelling up without pain. You bang your head a lot. You fail. You finally feel like you’ve learned something, until you get to the next level of knowing what you don’t know. When you get there, a good teacher can help you bang your head less, but until the Matrix becomes reality, they can’t undergo the experience for you.

A professor friend of mine always says that teaching and giving talks are the equivalent of putting your ass on the line, every day. I think it’s the same with putting yourself online, or even trying to. Failure is inevitable at some point. People might say mean things about you. But you do it anyway.  Or at least I do.