Saving Staff Time Through Better Data Import Routines

Have you ever loaded data into a publisher website by hand? Did you have the best time?

That was a rhetorical question, of course, but it points to an underlying time and resource issue many publishers face: loading data by hand into a new site can take weeks. After that, it takes a surprising amount of ongoing work to keep it up to date.

We have a better way. It takes less time, and it makes the process of keeping title info up to date much easier. It also makes it easier and faster to deploy a new site. Here’s how we do it.

SET IT AND FORGET IT

ReaderBound’s sophisticated import routines can quickly load and process large amounts of data. Just recently, we imported a catalogue of roughly 800 books, and the process described here (a complete re-import of all data on the site) can be done at any time if the need arises.*

It took almost exactly 5 minutes to download the 800 ISBNs from BiblioShare, and then 22 minutes to unpack and process that data into the CMS. During those 22 minutes, all of the content in the ONIX was leveraged across the site, including all of the associations between titles and authors, related titles, series, and subjects.

The import routines cycle new data in every day. So as you update the underlying ONIX with updated information (e.g., publication status, publication date, subjects, etc.) or new content (e.g., reviews or tables of contents), the data flows through automatically to the site during a running 24-hour cycle. And then all that new data gets unpacked and leveraged across the site again: series and subject listings update automatically, as do title listings, contributor profiles, and all other expressions of or references to underlying title data.

MANUAL CORRECTIONS AT ANY TIME

If you need to make an urgent correction – for example an incorrect price of publication status – you would make it in the CMS to have it immediately reflected on the site. Otherwise, any changes you make in your ONIX will flow through automatically to the site without any manual intervention.

Records can be added manually or modified manually in the CMS at any point and then published immediately to the site.

UPDATED DAILY

ReaderBound uses a rolling 24-hour process for checking for updated data and then importing it to the site where it is then reflected in data updates across the site. This is easier to picture if you think of ReaderBound as an integrated system with a few components:

A data source (e.g., BNC BiblioShare);
An import queue (an instruction to the system re: what to import);
A set of import processes (by which new or updated data is pulled into the system, unpacked, and processed);
The website itself (the content system where the data actually lives on the server).

Within the 24-hour ReaderBound cycle are three automated cycles that refresh data on the system:

Every hour on the hour: The manual import list is checked adding items to the import queue. What this means is that any manual revisions to the importer are processed hourly.
Twice daily at 5:00 am ET and 5:00 pm ET: The automatic update runs and covers all titles already on site. In other words, we ping whatever ONIX repository is supplying the title info every 12 hours for updated records for all ISBNs currently on the site. Any new records we find are added to the import queue.
Every four hours: The import processes all queued items. This includes any updated records from the daily BiblioShare cycle plus any items we’ve added manually since the last import cycle.

* The data path described here is based on the use of BNC BiblioShare as the data source, but the same process would apply with another ONIX repository. The important thing is that there is an authoritative ONIX repository, BNC BiblioShare in this case, that serves as a source for the title data used on the site.

Better living through metadata, or at least a good deal less time loading it by hand.