|
More
2003 IA Summit - Trip Reports
Stacy
Surla shares her thoughts on the IA Summit. Comment to ssurla@aspensys.com.
Slicing Up England: Content Analysis and Modeling for BBCi
Margaret Hanley, BBCi
BBC Interactive (BBCi), the web portion of the British Broadcasting Corporation, is decentralized and extensive. It has no meta-content management system. Rather, regional and subject-specific sites are managed by an array of manual procedures and independent CMSs. There are probably 100 separate sites on BBCi, and they do not share information with one another in a formal fashion. Consistency is achieved through templates. Content is published staticly, uploaded by each team to the appropriate upper level directory on the production server. Directory structures below the top level are developed by each team. (For instance, Oxford is in bbc.co.uk/oxford/, Kent is in bbc.co.uk/kent/, and the images folder is wherever each team likes to keep it.)
BBCi is a very interesting model of a large, distributed, semi-automated system of related content. An inside look at the steps undertaken to improve the management of this site will therefore demonstrate some useful lessons for other large sites, particularly those that are not managed through a central CMS or which are migrating towards a CMS.
In her presentation, Margaret Hanley summarized the current status of a content analysis and modeling project on which she is taking the lead. Its purpose is to improve content quality, make content sharing work better, and make it possible to syndicate content externally. This case study presents a splendid group of concepts, processes, and deliverable products that be adapted for use in other complex web environments. Her
PowerPoint presentation contains examples of the deliverables described below.
The first stage of the project focused on analyzing and modeling the current content in a subset of BBCi sites called "English Regions." English Regions comprises 42 local sites, each maintained by its own small team who publish highly reactive local content (e.g. News, Sport, Weather). The first stage generated several initial deliverable products, including a content inventory, preliminary content object library, and preliminary controlled vocabularies. In the second "English Regions CMS" phase of the project, the work is being taken to the next level, with a detailed development of the content object library and controlled vocabularies. The plan is for these to be used by the English Regions teams to describe their content. Using the controlled vocabularies to describe content will improve search and contextual navigation across the BBCi content management systems, and will also enable the XML schema to be used to share information between systems.
Main Concepts
The analysis and modeling task was underpinned by an object-oriented view of web content. Everything on the site was therefore described in terms of the following objects and properties:
- Content Objects: Smallest piece of content that gets used (e.g. article text, image, link, audio file). Content objects have metadata (title, body, expiration, subject, etc.), but they are fairly large-grained objects.
- Content Collections: A set of content objects assembled in a recognizable "type" (e.g. CD review, which is made up of article text, audio files, images, links).
- Metadata: Including Intrinsic Metadata (data extractable from a content object, e.g. file size), Administrative Metadata (data to help manage content objects, e.g. expiration date), and Descriptive Metadata (data on the "aboutness" of a content object, e.g. subject).
Process and Deliverables
The process by which the project is being carried out, and related deliverable products, are as follows:
- Site Audit: Purpose was to help identify where to focus the content audit
DELIVERABLE: discussion paper about the sites and the context of the content
- Content Inventory: Included disk scrapes to collect info on all content on each system
DELIVERABLE: inventory spreadsheet and descriptive document
- Content Audit: Identified content objects
Formulated how to think about structure and metadata
DELIVERABLE: report on metadata and opportunities for content re-use, syndication
DELIVERABLE: content audit description sheet
- Content Object Library: Description of every Content Object and its metadata
DELIVERABLE: hard copy of library
- Content Modeling: Developed with her Systems person at her side (her "data modeller" Data modeller was given the CO Library to normalize. He created the object model - a logical representation of the content objects. This became a data model (XML schema) for the content management system.
DELIVERABLE: data model; also site schematic and navigation labels
- Classification Scheme Analysis: Understanding the Content Objects. Proposition for how directories and naming conventions could be used consistently.
Deliverable: controlled vocabularies
How Content Analysis/Modeling Is Being Used in the System
In a tiered model of the CMS as it will be implemented, the presentation tier, content creation/re-use tier, and administration tier are defined separately. Each tier has its own objects, identified users, and location within the system. The example in the figure below models this for a CD review.
The example describes where existing content is drawn upon or added to (in the administration tier) to create a CD review (in the content creation/reuse tier), which is then surfaced via templates (in the presentation tier). The connections between the content objects, which generate the new CD review content collection, are also identified in the model.
Tiered Model Example - CD Review
|