In the first part of this three-parter, I started making suggestions about the internal changes needed to a magazine editorial process to make it ready for ebooks. Here I’ll go further. But first, let me make some starting assumptions.

1. The ebook (or emagazine, or whatever you want to call it) will not simply consist of a monthly edition of a collection of pages, each made of words and pictures – it will more likely be a rolling collection of pages and services. The traditional monthly magazine cycle being more related to distribution rhythms than anything. Indeed, why do we keep to a regular monthly cycle in print anyway? Why not, say, every three weeks in the winter, every five in the summer? I digress, but.

2. But for the sake of simplicity we’ll call each logical block of meaning a “story”, whether it is a traditional 4000 word prose piece, a slideshow, a video, a graphic, an interactive something or other, a subject-specific chatbot, or something machine-written, or a combination of all of the above.

3. That every medium – magazine, newsprint, iPhone, desktop web, tablet, projector, tv, parchment – is uniquely gifted in a particular way. So while the text and the meaning of the “story” might well be the same, the graphical language at least will be different. This means that at some point the art department and the production department for each medium must become separate from the editorial department, and from each other. It’s right here that the tensions occur within an editorial operation: one of the media will get ignored or sidelined for the sake of another. (How many TV shows have websites, compared to how many websites have TV shows, compared to how many website & tv show properties are there? Compare WIRED in print to WIRED online. And so on.)

4. If you’re going to produce content for more than one medium, therefore, you need to commission content in a strange and abstract way. The story creator’s work for print will be flat text; For an e-book, hypertext; For a web service, hypertext and perhaps an interactive graphic. And more, and so on. This means that the author has to hand in copy that is much much more than a flat text file circa 800 words. It needs to be annotated. It needs to be hyperlinked. It needs to have underlying data. It needs all of this and more to allow the art and production departments of each medium to produce the very best representation they can of the story within their own medium, otherwise their medium will come across as half-arsed. Half-arsed is worse than not doing it at all.

5. Existing content management systems can’t do this. Existing magazine workflows can’t do this either.

The perfect case in point is that of metadata. There’s a word you hear a lot on the web. It means the data about the data – the author’s name, the date it was written, and so on. In its purest and perfect sense, any story has an infinite amount of metadata: this piece you’re reading now was written by me, on my macbook air, on the 29th December 2009, in London, in part in Whitehall, in part in the Milk Bar, Soho, and in part in my office in Notting Hill, this paragraph being written with an ambient temperature of about 20 degrees c, while the local weather was cold and rainy, etc etc etc. It concerns electronic publishing, and refers to…and is classified as…and is linked to…and is part of a collection called…and is built on thinking done in…

You can go on and on.

The problem is that metadata is incredibly fragile. If you don’t capture it when you can, it is lost forever. The date you wrote that piece? The websites you looked at when you were researching it? The music playing during that photoshoot? You didn’t write it down? Ah, then it’s gone.

Even more upsettingly, the way that we file our copy today necessitates losing all the metadata we can. Emailed word documents, or plain text files, contain virtually none of the metadata we could use: stuff that we get for free simply by the passage of the creative process itself is instantly thrown away by the workflow. The only times many reporters have to hand in their metadata is when they’re being sued – but even if they were willing, the content management systems don’t have a place for it. This we’ll come back to.

The necessity above all else of keeping your metadata might seem like a geeky affectation – something that is really only of interest to librarians (itself not a bad reason) or trainspotterish data-completists – but it is in fact the simplest and cheapest route for a publisher to future-proof their business. Your revenue depends on it. Remember, we’re talking about a business sector whose incumbents are trying to transition from having an advantage regarding printing and distribution, to having an advantage regarding content. In other words, Vogue’s printing presses and relationship with COMAG are both lovely, but in a digital world ultimately worthless: it’s the combination of the creative and ad-sales teams upstairs and the rights-owned archive in the basement that gives it value now and in the future. Only one of these things is replicable by others.

So why do everything you can to keep metadata intact? Because it’s from this information that new products can be automatically created, at a scale and rapidity that would be impossible otherwise. With every piece of metadata that you don’t throw away, you gain a factor more potential ways of slicing through your content and delivering it as a separate product, simply as a result of a database lookup. In the case of Vogue today, say, commissioning an editorial product that simply shows every dress designed by Christian Dior that appears in the archive would involve weeks of intern-work, instantly making it unprofitable or too late. A metadata-complete archive in the future would give you that with a single line of code. As an example, here is a sentence that will be spoken in a newsroom sometime this coming decade:

“Right then, website people. Fidel is finally dead, so I need a special page with everything we’ve ever written about Castro, plus any travel writing we’ve had from Cuba, and all the pictures we have from the region mapped by place and time, and everything we wrote about JFK and the Bay of Pigs, and I need it online in an hour. Go.”

The reaction to this request is solely a function of the content management system the newsroom uses. You simply can’t do this in a sane or profitable way without an archive with all the metadata preserved. You could do it slowly, sure, with brute force and interns, but who will have those? More to the point, who will have those and still be able to compete in terms of both cost and speed with those that don’t bother with interns and excessive staff in fancy buildings but instead have a workflow designed with the future in mind?

There is immense potential for new editorial products being created by being able to slice through your existing content in new and interesting ways. Personalisation and location-based services are dependent on this; collaborative filtering works better when you have it; APIs get exponentially richer the more data you have to expose. But without an underlying library of your content complete with metadata, you can’t do any of it very well. If you’re going to rock a multi-outlet, multimedia world, you need to have stories whose parts are way more than their sum. This is new, and in part three, I’ll talk about what this means for journalists, for publishers, and for people designing the systems that use.

Go back to part 1 of this here, or go ahead to part 2.5.