Classics, data, IA

If You Want Something Done, Part 1: Getting Ovid into JSON


A couple of weeks ago, I was complaining about the lack of APIs for Classics authors that weren’t Vergil. And I knew, in my heart of hearts, what that meant. I’ve known for quite a while that when anyone says, “Someone should really do/make/code X thing…” the only answer is, “Yes, you should do that, because nobody else will.”

So. I already knew that this was going to apply to my situation. And I’d already been doing a few experiments here and there, and in the past couple of weekends I finished the first step of my plan, which was liberating the Ovid I wanted from its XML — which is here — and putting it into consistently structured JSON.

The initial XML structure is descriptively correct but, shall we say, challenging, if we’re thinking about publishing — it always helps to have your job to be done in sight, right?

Because I don’t care about dictionaries or searches yet, and because there’s probably a tool (the CLTK) that can take care of individual words, I made a single line the smallest unit of my schema, and worked out from there. And I did this on paper first; I don’t know about you, but that’s the only way I can keep my brackets straight.

A Latin poetry JSON schema, by me.

 

I then made a nested relational structure: line belongs to poem, poem belongs to book, book belongs to work. Many lines to one poem. Many poems to one book. Many books to one work — in theory!

In reality, I had to munge the structure. From the test runs where I imported some poetry into a database, I already knew that the real bugbear, as I expected, was the fact that the book/poem structure is inconsistent. So, in this collection we have:

  • Amores: 3 books, and multiple untitled poems in each book
  • Ars Amatoria: 3 books, with just one long, untitled poem per book
  • Heroides: no books, just a bunch of poems (with titles, by the way)
  • Remedia Amoris and Medicamini: each work is just one long poem. No book or poem numbers, traditionally, and no titles apart from the work
  • Oh, and did I mention that Amores Book One has an epigram, i.e. a poem that doesn’t count towards the numbering of ALL THE OTHER poems??? Yeah…

It sounds like the world’s lamest logic problem right? Anyway, to make things consistent, I assigned every work a book-poem structure. So, if it’s just one long poem, that’s Book 1, Poem 1. If it’s multiple books consisting of just one long poem each, they’re going to be: Book One, Poem One; Book Two, Poem One; etc. And if it’s a single collection of poems, which doesn’t traditionally have books attached, it’s going to be Book One anyway. That’s why my note, in the lower left hand corner, says “*Add invisible book to Med. etc.”

Also, everything gets a title, whether it’s real or not, and everything gets an index to indicate its order. As Ovid API czar, I declare this to be so. And SO IT SHALL BE:

Behold, the Ovid JSON in browser.

Now, in traditional nomenclature, you wouldn’t actually use book/poem for every work — reasonably, I might add! But we need to toe the line between descriptively describing what’s happening, and prescriptively making sure things are consistent enough to go into a database. Also, once I get the API going, I’m going to see if there’s a way to make the URLs skip over “Book One” if there’s only one book, “Poem One” if there’s only one poem.

Other notes:

  • I asked my API genius friend where in the metadata I should put that these are Latin, and he didn’t know, so I left out it for now.
  • Fun times switching between “1” (string) and 1 (integer) when writing out index data.
  • I may have an idea what I ultimately want to do with this, once it’s up, but I’m not sure yet.
  • And the next step will be getting this puppy into a relational database. So that’s going to be super fun.