Category Archives: Uncategorized

WordPress to Indesign: The final countdown

We’ve had a lot of requests for us to open source the WP Browser, the final piece of getting a post from Google Docs to WordPress and into Indesign. We hadn’t up till this point because it was a plugin that only worked on Windows and, well, was pretty bad.

Over the last few weeks, I’ve been working on rewriting WP Browser using Indesign’s Javascript API. It runs as a Script, not as a Plugin, but it works on both Windows and Mac, it’s lightweight and it doesn’t need anything extra (like Adobe Air) to run. And now it’s open source.

If you want to skip to the end, you can download the Indesign plugin as well as an associated WordPress plugin at https://github.com/bangordailynews/WordPress-to-InDesign.

Also, shameless plug: If this looks cool and you want to build more things like it, we’re hiring.

We’ve been using the WordPress plugin for quite a while now, and testing the Indesign plugin for a little while. They work pretty well.

The WordPress plugin allows you to map various HTML tags to Indesign paragraph and character tags. The easiest way to start would be to create a text box with all the styles you want to map in Indesign, then export the text in that box as Indesign Tagged Text (File -> Export, then select Indesign Tagged Text. Abbreviated is fine.)

Tagged text is just a super easy way to tell Indesign what paragraph style each block of text should appear as. Generally it appears as <pstyle:Paragraph Style Name>. Things like bolding and italics generally appear like <ct:Bold> and <ct:Italic>. A really simple tagged text export looks like this (I deleted the paragraph style definitions at the top for clarity):

[php]<ASCII-MAC>
<pstyle:Body Text>This is some test text
<pstyle:Article Subheading>This is an article heading
<pstyle:Article Subheading>Still an article heading
<pstyle:Body Text>Some test text with <ct:Bold>some bolding<ct:> and <ct:Italic>italics<ct:>.[/php]

Pretty simple, right?

By default, the WordPress plugin will save off each post as tagged text in /wp-content/uploads/indesign/. The files are named post_id.txt, and you can use rsync and a cron job to sync them locally if you wish. The tagged text is also available through a simple JSON api that will be used in the Indesign plugin.

The thing I keep calling an Indesign plugin is, as I mentioned, actually an Indesign script. To install, open up the scripts pane in Indesign. (If it’s hidden you can show it by going to Window -> Utilities -> Scripts.) Right click on the Application folder, then click Reveal in Finder and navigate into the Scripts Panel folder in the window that opens up. Then drag and drop WP Browser.jsx into the folder. When you go back to Indesign, WP Browser will show up under the Application folder in the scripts pane. Double click to open the browser.

The WP Browser pretty simply allows you to perform a fulltext search on your WordPress install. You can modify the script to either read the post_id.txt files we talked about above from a local location or to dynamically create a local file with the tagged text each time we hit import.

WordPress_Browser

By default, the filter list is populated by categories. I’ll tell you how to modify it below (we’ve modified it to populate from a list of the paper’s sections, and the stories are categories to go into each section).

Before you click import, you must have a text box selected. If you have the entire text box selected, the story will replace everything in that box. If you have a selection of text selected, it will replace that selection. Else it will insert the text where your cursor is.

Both the WordPress plugin and Indesign script require a bit of modification:

  • WordPress plugin
    • Set an API key at the top of the file. This will be used in the Indesign script and ensures no unauthorized access to all your unpublished posts.
    • in function do_tagged_text:
      • You can set “formats” for the story to decide which paragraph styles stories are generated with. By default, the post meta key for that format is _format. You can change that if you want.
      • We also use a post meta field to override the author attached to the post (for example, for one-time contributors). By default, that field is _byline.
      • The plugin integrates with Co-Authors Plus. It also allows you to identify a meta entry for the user that will display as their “title” (ours is BDN Staff, by default).
      • We strip out all HTML that’s not a heading, a paragraph tag, a list, or text styling. You can be more or less strict.
      • Where we start replacing p, b, em, etc tags, the plugin by default maps the character style to <ct:Bold> and <ct:Italic>. If you use custom fonts, you might have to change this.
      • We also strip the state out of datelines if it’s local. (See line ~180)
      • Around line 200, we start converting headings to paragraph styles. After we’re done with that style, you need to change back to the default style.
      • Then, around line 275, you’ll want to set the paragraph styles for byline and the default paragraph style. There’s also an example of how to change the styles based on format. (This could be coded better.)
    • in function wp_browser_search
      • By default, you can filter posts in the WP Browser based on category. To change this to a different taxonomy, you’ll need to change the get_categories() call ~310 as well as the category_name arg in get_posts ~335.
      • By default, we only query for posts with status publish, draft and pending. If you want to expand or limit that, you can do so ~330.
  • WP Browser
    • On line 3, set the API key you set in the WordPress plugin
    • On line 4, set the domain of your website, no leading http:// (just example.com). The plugin doesn’t currently query over https.
    • On line 8, decide whether you want to import from post_id.txt files saved locally or from a file created dynamically.
    • On lines 13 and 15, set the path to the files above on Mac and Windows, respectively.

One last thing:

When InDesign makes a call to the server, it does so by creating a socket connection and then requesting the path, or something.

In short, your server will see a request come in for localhost/wp-admin/admin-ajax.php?etc

So, especially on multisite and possible on regular WordPress, you’ll need to set the host for it to work.

I did this by adding the following line to wp-config.php. There’s probably a better way to do it:

[php]if( ( $_SERVER[ ‘HTTP_HOST’ ] == ‘localhost’ || empty( $_SERVER[ ‘HTTP_HOST’ ] ) ) && !empty( $_GET[ ‘action’ ] ) && ( $_GET[ ‘action’ ] == ‘wp-browser-search’ || $_GET[ ‘action’ ] == ‘wp-browser-notify’ ) )
$_SERVER[ ‘HTTP_HOST’ ] = ‘mysite.com’;[/php]

I just ripped a lot of this out of the BDN site and took a lot of our customization out. It will definitely require customization. It might break. Leave a comment below if you have a question. Email me at wdavis@bangordailynews.com if I did something really stupid.

We’re hiring! Data geeks and news hackers

We’re looking for data analysts and coders to join the BDN’s new Research & Innovation Department.

The BDN is dedicated to rethinking how “legacy” media operate. If you enjoy working on multiple projects with different focuses in quick succession you’ll enjoy working here. We want people who understand how to build tools people will use and who are interested in changing user habits for the better.

Coding projects include tracking digital users into the physical space, reinventing how the company thinks about its systems and running the highest-traffic news site in Maine, plus the ideas you bring to the company. Data projects will run the gamut from audience data collection to reach new customers, to business analyses that identify areas for growth to leading the newsroom in data-based reporting.

R&I is a new department that operates as a startup inside the BDN responsible for product development, technology and leading the company in making data- and research-based decisions.

We like to move aggressively and quickly at the BDN. We avoid bureaucracy and encourage transparency. We open source things when we can (read: when we’re not too embarrassed). The BDN is family-owned (no corporate overlords). It’s big enough to have resources and impact but not so huge you can’t ever get anything done.

To apply, email jholmes@bangordailynews.com. If you’d like more details, feel free to email me directly at wdavis@bangordailynews.com.

Google Drive to WordPress (to InDesign), refined

It’s been a while since I’ve posted here. We’ve been busy refining our existing systems in the newsroom and tackling inefficiencies in other departments.

At the BDN, we often try to drill down to what is really necessary and important, as opposed to what is traditional but not valuable, by boiling the task at hand down until it’s as basic as possible. We try the basic method and in the process learn what is truly necessary and what was extraneous. Along the way we also collect nice-to-haves, and if the opportunity presents itself we will include those in future development.

We boiled our editorial publishing workflow down to a very basic system that has worked very well for us (and for other papers) for more than two years. Pleased with our writing and publishing process, the next big problem was our budgeting system.

Outside of clunky CMSes, maybe the worst inefficiency in many newsrooms is the budgeting system. Many papers just use a doc for this. For a while at the BDN we used Zoho creator. But every budgeting system I’ve seen lacks the ability to properly track the status of each story, and organization is often a mess. At many larger companies, each desk has its own budgeting system, further complicating matters.

In addition to the issues with our budgeting system, we saw room for improvement with Drive. We wanted to refine and standardize our story workflow. We’d had some confusion caused by Drive’s UI — docs would get moved inadvertently or dropped out of folders altogether.

So we decided to build a new budgeting tool, and used the Drive API to integrate the writing process with the budgeting process.

Now, creating a budget line and creating a Google Doc are one and the same. As soon as a reporter budgets a story, the budget tool creates a fresh Google Doc using the Drive API and attaches that doc to the story. When the reporter writes the story, they do so right in the budget tool, which is just a slightly customized Google Docs interface. The intuitive Google Docs interface remains, as do features like collaborative editing and offline access.

The budget information and the links to the docs are stored and organized inside WordPress. We also back up the latest version of each Doc inside WordPress so that if, for whatever reason, Drive cuts out we won’t lose our work.

We’d talked about building a system like this for a while. Once we finally decided what we wanted, building the system took minimal time (I honestly don’t remember how long, but it went by pretty quickly).

So, here’s how it looks and works:

The code isn’t quite ready to be open-sourced yet, but if you’re interested in the code shoot me an email (wdavis@bangordailynews.com) or leave me a comment and we’ll work something out.

Oh, and did I mention my Docs to WordPress plugin is still a great solution for papers looking to get out from under legacy CMSes?

Knight News Challenge

Just a quick shoutout: The BDN is very honored that our submission to the Knight News Challenge made it to Round 2. Our proposal is to take what we’ve done at the BDN, rewrite the code now that we have a better idea of what we’re doing and package it for release so that other news orgs can really easily implement what we’ve done here. If you have a second please visit the KNC website and leave your comments on the project.

Appeal for help, because I’m terrible at regex

One of the most annoying things about using Google Docs is that none of the styles are inline. It used to be that bold text was wrapped in a <strong> tag and italic text was wrapped in am <em> tag. No longer. Now each style of text is wrapped in a span with a number of different classes applied to it. Those styles don’t carry through when we bring the text into WordPress and the names of the classes vary from article to article. This can be very annoying for columnists who bold names of subjects, for example.

So, what I’m looking for is a regex expression to turn <span class=”c0 c3″>My text</span> into <span class=”c0 c3″><strong>My text</strong></span> where class c3 is the bold class, for example.