Many open data catalog sites have large numbers of datasets which link to resource files hosted on other sites. These datasets may be created by many different staff members (members of different government departments, for example). The datasets may even be automatically imported (harvested) from other sites.
With so many links maintaining dataset quality is difficult. A service is needed to automatically find broken links and enable dataset maintainers to see at a glance which links need to be fixed, and site admins to see which maintainers are responsible for datasets with broken links.
The Dead or Alive extension runs a background link checker service that checks and re-checks all of a site’s datasets for broken links every 24 hours.
A resource is marked as broken if at least three consecutive link checks over a period of at least three days (configurable) have found its URL broken. A public report is added to the resource’s page detailing the results:
There’s also a public report of all broken links on the site. Broken links are grouped by data publishing organization, with the organizations with the most broken links first. Hopefully highlighting which publishing organizations have the most broken links will encourage them to fix the problems:
Finally, there’s a private broken links report viewable by site admins only. This report groups the broken links by the email address of the maintainer or author responsible for the dataset. The aim is to make it as easy as possible for site admins to email dataset maintainers about their broken links:
The Email maintainer buttons open a new email in your email client with the address, subject and body already filled out for you (on the production site the email would be in Swedish, of course):
From: Sean Hammond <email@example.com> To: mäintainër_2@authors.com Subject: You have 3 datasets with broken links on Oppnadata.se These datasets have broken links: Org 0 dätaßët 2 http://oppnadata.se/dataset/org_0_dataset_2 Org 0 dätaßët 1 http://oppnadata.se/dataset/org_0_dataset_1 Org 1 dätaßët 3 http://oppnadata.se/dataset/org_1_dataset_3 -- Sean Hammond Site Administrator, http://oppnadata.se
We aimed to make this plugin as simple and easy as possible to install and maintain. See the README for instructions.
The link checker service isn’t just for CKAN sites, it works with any site that implements its simple API.
Features we’d like to add in the future include:
Check new and updated resource links as soon as they’re added to the site (not just hourly).
Buttons on the site for admins to request an immediate re-check of a resource’s, dataset’s, or the entire site’s links.
A manual override, allowing sysadmins to mark a link as working or as broken even if the link checker thinks otherwise.
To contribute get the source from github.com/ckan/ckanext-deadoralive and send a pull request.
For details about how Dead or Alive works and why it was designed the way it was, see Background Tasks as Simple Web Services.