Category Archives: Website transition

Form ever follows function, eventually

Our online editor, Will Davis, has been explaining how we flowed text from Google Docs to WordPress to print and created a low-cost, portable front-end system for our newsroom at Bangor Daily News. I wanted to tell you a little about why we did it.

If you read what Jeff Jarvis, Chuck Peters and Clay Shirky have been writing about decaying newsrooms and the need for new models, it’s hard to believe they weren’t taking notes in some darkened corner of ours, perhaps in Sports behind the stacks of old game results and Mountain Dew cans.

Like many newsrooms, until very recently we were production heavy because we had to be. Moving stories to the web was a copy-and-paste affair, but that’s not where the trouble started. If you begin with a print-directed front-end system, as we did, how does that system accommodate a story being updated from the field? Or how would the full possibility of story assets land online, to be chosen among for print? Even simpler: When do reporters add links? The answers, as countless journalists know, are: It can’t; they won’t; they don’t. From there, it’s all production, not creation.

As we lost staff to cutbacks over the years, assembling our content into finished products was taking a larger and larger percentage of our time. Simply processing press releases seemed to suck up significant portions of editors’ days. No one wanted to be in this situation, but our infrastructure for moving content demanded it. We were trapped.

We needed reporters to get out of the tools they had been using for more than a decade to drive toward single shift-end deadlines. We needed to simplify the connections between what reporters wrote and what the public saw. We needed to link our bureaus so that they were much more a part of the daily news flow; mobility, so that any staff member with a cell phone could file from anywhere; web archiving that allowed us to expand on stories and retrieve content below the level of a story — in brief, we needed to match the way our audience now acquires information. Also, we didn’t have any money for this project.

Then along came JRC’s Ben Franklin Project, pointing the way. We had begun using Google Docs in our new media department in 2007, when the department was created and we suddenly had to keep and share records on web development, ad sales and commissions, good ideas and meeting minutes. Docs as front-end newsroom system became apparent as Google improved its product and we needed a place for reporters to store notes, interviews, story ideas and all the rest in a place they could organize. The WordPress CMS, as good as it is, didn’t seem like the place to do that.

As the newsroom has grown comfortable with Docs, it is becoming more efficient (links and headlines, for instance, travel from Docs to WordPress) and we are shifting staff members from production to content creation. We knew we had a winner in Docs when we had a major election story with two reporters in the field and an editor in the newsroom, all working simultaneously on the same breaking story, adding content, seeing in real time what each was adding, talking to each other through the chat function and responding with updated information. Fast, simple, low cost.

We’re a long way from done. We’re still working on ways to present data and extract pieces of story content to create a coherent, useful whole; and we are just beginning the process of providing our audience a range of tools to contribute their own content. But in the newsroom, the guiding ideas we have put into practice are to match the tool to the job we need done (rather than the reverse), reduce the number of steps required and anticipate how our audience will want the information next. And the cost should be next to nothing.

Marrying Google Docs and WordPress (or really any CMS)

At the BDN, we love WordPress, but we didn’t feel it was the right place to have our reporters working. Even with distraction-free writing, there’s too much cruft that would confuse reporters or that we don’t want reporters touching. We wanted something lightweight, fast, simple to grasp and that would make editing easier.

Enter Google Docs. It’s easy to use, lightweight and a large number of people are already familiar with it. And, it’s got beautiful tools that enable real-time collaboration with reporters and editors.

That’s where we decided to start. Reporters write stories in Google Docs, and the docs are sorted in a series of collections, which are shared out to everybody. We have collections for each desk, such as metro, state, sports, etc.; collections for workflow, such as Needs Copy Editing, On Hold, etc.; and collections for actions, such as Send to Publish and Published.

The action collections are important — they’re how docs actually get from Google to WordPress. When we first started using Docs with our sports department in August or September of last year, Docs was much different, more neanderthal. But, that was nice in some ways. The docs came through with inline markup, and Docs supported XML-RPC, which made it easy to connect the two systems.

Then came an upgrade, which was on the whole a good one. It combined what they’d learned building Wave to bring real-time collaboration to a new level. But it also eliminated XML-RPC support, and docs aren’t marked up as nicely as they used to be.

So, we delved into the API, and I think what we have now is an even nicer system than XML-RPC could provide. You can find a version of what we built in the WordPress Plugin Repository.

On the whole, the process is fairly simple. When we’re all done copy editing an article, the doc goes into a collection called Send to Publish. We have a script running every two minutes that will grab docs from that collection, process them, and move them to WordPress. Then the script takes the doc out of Send to Publish and puts the doc in a collection called Published.

Here are a few features the plugin provides:

  • If the doc is a new post, it will go into WordPress as a draft. If the doc has already been put into WordPress, it will update the previous post.
  • Usernames in Docs must correspond with usernames in WordPress.
  • If a doc is in a collection and there is a corresponding category in WordPress, it will automatically put that the post in that collection. Else it will put it in the default category.

There are a few filters and actions in the plugin that allow you to extend it, and the plugin actually currently comes with one extender, which will strip out comments into a custom field and normalize the content. It’s a slightly stripped-down version of the extender we use.

We actually go one step further by fielding data using delimiters. We name the doc with a slug instead of a headline, and then the first line of the doc becomes a headline, followed by a pipe (|). After that comes the body copy, and at the very end we can add another pipe. Anything after that last pipe acts as a comment. We do this because we also use the API to put things, such as AP stories, into Google Docs, and we couldn’t figure out a way to add comments via the API.

To install the plugin, you’ll need to upload it to your /wp-content/plugins/ folder. Right now there isn’t an extender for wp-cron (there will be soon), so you’ll have to put the action on a page and point a cron job to it. I do this like so:

<?php
include('./wp-load.php');
$docs_to_wp = new Docs_To_WP();
$gdClient = $docs_to_wp->docs_to_wp_init( 'me@example.com', 'mypassword' );
//We're just going to call one function:
$docs_to_wp->retrieve_docs_for_web( $gdClient, ID of origin folder, ID of destination folder );

As always, the code is on Github.

Importing archives into WordPress

I’m starting with the import process not because it is an exceptionally good place to start when preparing to move a site to WordPress but because it’s one of the few things I got right from the get-go when I transferred my first news site to WordPress.

The best way to import content into WordPress, in my experience, is using WordPress’s XML import.

WordPress’s XML files follow an easy-to-grasp but powerful structure that, in general, goes something like this:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:excerpt="http://wordpress.org/export/1.0/excerpt/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.0/"
>

<channel>
	<title>My example site</title>
	<link>http://example.com/</link>
	<description></description>
	<pubDate>Thu, 28 May 2009 16:06:40 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<language>en</language>
	<wp:wxr_version>1.0</wp:wxr_version>
	<wp:base_site_url>http://example.com/</wp:base_site_url>
	<wp:base_blog_url>http://example.com/</wp:base_blog_url>

	<item>
		<category domain="category" nicename="my-category"><![CDATA[My Category]]></category>
		<category domain="tag" nicename="my-tag"><![CDATA[My Tag]]></category>
		<title><![CDATA[My Post Title]]></title>
		<dc:creator><![CDATA[My Name]]></dc:creator>
		<link>http://example.com/2010/07/06/my-post-title/</link>
		<pubDate>Tue, 06 Jul 2010 10:51:32 +0000</pubDate><dc:creator><![CDATA[bdnoutdoors]]></dc:creator>
		<guid isPermaLink="false">http://example.com/?p=12345</guid>
		<description></description>
		<content:encoded><![CDATA[My post content.]]></content:encoded>
		<excerpt:encoded><![CDATA[My post excerpt.]]></excerpt:encoded>
		<wp:post_id>12345</wp:post_id>
		<wp:post_date>2010-07-06 10:51:32</wp:post_date>
		<wp:post_date_gmt>2010-07-06 10:51:32</wp:post_date_gmt>
		<wp:comment_status>open</wp:comment_status>
		<wp:ping_status>closed</wp:ping_status>
		<wp:post_name>my-post-title</wp:post_name>
		<wp:status>publish</wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type>post</wp:post_type>
		<wp:post_password></wp:post_password>

		<wp:postmeta>
			<wp:meta_key>my_post_meta_key</wp:meta_key>
			<wp:meta_value>My Post Meta Value</wp:meta_value>
		</wp:postmeta>
	</item>

</channel>
</rss>

That’s a fairly simple usage of WordPress XML, and when we imported our content from the Bangor Daily News we did a lot more.

For example, we imported all our posts with a hidden post meta (_old_id) value of the article’s ID in our old CMS. Then, we used the CP Redirect plugin as a template for a new plugin to redirect people clicking on old links to the new URL.

We also found that using the <dc:creator> tag quickly overwhelmed us. As with any newspaper, there are thousands of people who have written just one or a few articles for us, and we didn’t want to create accounts for all of them. Instead, we created a whitelist of authors we wanted to come in as users — basically just BDN staff and frequent contributors and freelancers — and the rest of the posts came in with a default username and with the author’s name in a most meta field name _byline.

We don’t embed images in posts. Rather, we query for all images attached to the post and display them at the top of the post and in the sidebar (more about this in a later post). So we natively imported the images so they would become attachments. WordPress automatically copies all attachments onto the server, so we didn’t have to worry about getting all the images off our old server. Importing the images is just as easy. <wp:post_parent> is set to the ID of the post, <wp:status> is set to inherit and <wp:post_type> is set to attachment. The image path goes in <wp:attachment_url>, and the caption goes in <excerpt:encoded>.

We broke the XML files up by 1,000 posts at a time. All in all, we had more than 100 XML files. We also imported everything onto a local machine and then pushed the database back up to our webserver. All told, importing everything took several solid days of work.

If you’re working on a site much larger than ours, you might consider importing posts directly into WordPress using the API, but to be honest I’m not sure how much overhead that would save.

The script, which you’ll have to modify a bit but hopefully not too much, is on github.

A quick overview of our editorial workflow

Lauren Rabaino asked on a previous post for a video of our entire workflow. The whole process is actually pretty simple, so it wasn’t hard to record it.

Everything starts in Google Docs — that’s where the reporters write their stories, the AEs read them and the copy editors edit them. We interface with Google Docs via its API and WordPress via XML-RPC to move stories out of a folder and into the CMS. It requires a bit of cleanup, but for the most part everything goes smoothly.

All the stories are then saved on a local server as Indesign Tagged Text files, and prepared for print. Styles are applied, the byline and headline is added to the top of the story and a few other changes are made. We try to keep as much formatting from the web to print as possible, including bolding, italics, and a whole host of styles, particularly for sports.

We custom-built a plugin for InDesign that allows us to easily search WordPress and import the files from the server.

Video:

Credits

I wanted to credit a few people who were immensely helpful in getting the site up and running.

I’m William P. Davis, online editor at the Bangor Daily News. Todd Benoit is Director of News and New Media, and Martha Ward is Product Manager.

Mo Jangda, who now works for Automattic, which runs WordPress.com, wrote the Zoninator.

Juan Carlos Sanchez wrote the C plugin that integrates WordPress with InDesign.

Mark Jaquith, a lead dev for WordPress, and Ryan Duff provided much-needed freelance support.

We manage our servers in conjunction with Firehost. WPEngine is also a great host that we used for a time until we decided to go self-hosted.

Andrew Nacin, Daniel Bachhuber and Scott Bressler fielded a multitude of questions and developed excellent plugins.

Everyone, including Daniel, at CoPress who truly broke ground in this area.

And, especially, everyone who has ever contributed to WordPress core.

Open Source is superior because of the community around it.

Bangor Daily News completes final switch to WordPress

Wednesday, we pointed the last of our traffic to our WordPress servers.

We started planning the transition soon after I started at the BDN in July, and started beta testing the system in late August with our Sports section.

Stories are penned first in Google Docs, then brought over to WordPress via XML-RPC and pushed to InDesign via tagged text. It’s a unique system we built, for the most part, from the ground up, and we believe we’re the largest newspaper running entirely on WordPress.

Over the next few months we’ll be extensively sharing how we did it and open-sourcing much of the project. Our goal is to help other newspapers set up an easy-to-use, low-cost content management system. The setup is actually quite simple and easy to implement.

For the time being, feel free to leave comments with questions or e-mail me at wdavis@bangordailynews.com.

To get everyone started, I would recommend a few plugins that I think are must-haves for any news org on WordPress. The first, which the BDN commissioned from Mo Jangda, is The Zoninator, which allows you to order content by hand instead of chronologically.

Another is Edit Flow, which is an important tool for managing workflow through WordPress.

Scott Bressler‘s excellent Media Credit allows you to natively set the credit for images, instead of including the information in cutlines.

Co-Authors Plus, also by Mo, allows you to set multiple authors per post.

And CP Redirect is a good example plugin for how you might remap links from your old site. We used it as a template to avoid dropping links.

You might also wish to check out the Ben Franklin Project, from the Journal Register Company, CoPress, which, although not operating anymore, contains a trove of useful tips for converting, and a post I did in 2009 after converting my college newspaper to WordPress.