|
Introduction to NewsML
The IPTC started
to work on "an XML-based standard to represent and manage news
throughout its lifecycle, including production, interchange, and
consumer use" in 1999. After only one year of defining requirements,
working out specifications and development NewsML 1.0 was approved
by IPTC
in October 2000.
NewsML proved to be stable in production environment: since its introduction
it was updated only two times, the current version is 1.2 of October 2003.
Covered requirements
- Support the representation
of electronic news entities such as news-items, parts of news-items,
collections of news-items, relationships between news-items
and metadata associated with newsitems.
News may be delivered as single items, or in packages
of several related items, and has to have the metadata to allow
efficient production, delivery, and use (including sorting and
searching).
- Be usable throughout
the news lifecycle.
While the main use will probably be for news interchange, the standard may
also be applied to the creation, management and publication of news in networked
systems, and for archive applications.
- Allow news-items
to consist of arbitrary mixtures of media types, languages
and encodings.
News packages can consist different types of content - text, images, video,
audio - all of which are treated equally. The same news item may also exist
in a number of different forms, such as translations of text into different
languages or the presentation of images in alternative formats.
- Be usable either
as a replacement for or allow the transport of all existing
news formats and encodings.
The hope is that NewsML will gradually come to replace
older news exchange formats - such as the Information Interchange
Model IIM.
However, where other formats perform different functions (like
the News Industry Text Format NITF with
its formatting capabilities) it must be possible to include them
as self-contained items within NewsML.
- Support a number
of different physical constructions of the same data. Depending
on user demands, and the delivery systems in use, there may
be a need to supply the same news content in different ways.
Some users may want all of a providers output delivered
directly, while others may prefer to receive notification of
availability (with an indication of content) and then retrieve
the item if they want to use it.
- Support the management
and development of news-items over time.
News stories often develop gradually so there is a need to update, add to,
or replace earlier versions. Items in different media may not be available
at the same time, so may have to be brought together.
- Be simply extensible
and flexible.
Requirements are liable to change as the markets develop - a fixed structure
could rapidly become out-of-date. In addition individual users may wish to
add their own features and extensions.
- Allow for authentication
and signature of metadata and newsitem content.
The value of news content, and its associated metadata, depends on its reliability.
- Not be unduly verbose.
Transmission systems vary in capacity throughout the news industry and the
demands on them keep growing, so there are advantages in keeping the transmission
overhead as small as possible (provided the other requirements are met).
NewsML also needs to be suitable for use with both push and pull delivery
systems.
- Use XML and other
appropriate standards and recommendations.
Adopting XML makes it possible to build on a proven - and fast growing -technology
and will help to ensure acceptance by the wider information industry. Since
XML is now well established software tools and development expertise should
be generally available.
Structure of NewsML
Representation and management of news throughout
its lifecycle is the aim with NewsML, while the standard has been
designed to give considerable flexibility and allow for straightforward
extension to suit individual user needs. Inevitably this has resulted
in a rather complex and layered structure that can appear difficult
to understand. However, there is no need to use all the features -
so it would be possible to have a relatively simple implementation
for, say, text handling - and the underlying logic is straightforward.
NewsML takes the form of an XML document, which
has a series of components, or elements, that are used to structure
and process the actual news content. These elements may have attributes
to specify their properties and can carry content in the form of
other elements (sub elements) and/or character data or external references.
News Metadata
Efficient use of metadata is a key feature for
NewsML and considerable effort has been put into the development of
a core set of metadata. This work was able to draw on the substantial
intellectual capital represented by the earlier IIM (Information Interchange
Model) and NITF (News Industry Text Format) standards, but has been
substantially extended, making use of some advanced XML features.
In general, the design of NewsML tries to keep
the metadata as close as possible to the item it describes, while
much of the metadata is optional.
At the lowest level that could contain news data
- the "ContentItem" - attributes can be added to describe
the physical character of the news representation.
At the next higher level - the "NewsComponent" -
several types of metadata can be added:
- AdministrativeMetadata deals
with information about the origin of the NewsItem and includes
the file name. The Provider and Creator of the news object can
be identified, along with the source of the information, while
specific provision has been made for identification of syndicated
items. A Property element allows for the addition of any other
administrative metadata that may be required for specific applications.
- RightsMetadata deals
with the copyright of the NewsComponent, including details of
any usage rights that have been granted to other parties by the
copyright holder. Where supplied, this information is in text
form along with (optional) links to machine processable data.
- DescriptiveMetadata is
used to describe the content of a NewsItem with specific provision
made for Language, Genre (the nature of the NewsItem, such as:
Current, Analysis, Forecast, Interview, Retrospective); OfInterestTo
(target audience), and TopicOccurence. Again, there is a Property
element - to allow inclusion of any other descriptive metadata
needed for a specific application.
- NewsLines can
be thought of as being a human-readable (text) representation
of some of the metadata - generally they have a property of being
both machine readable and human readable; apply across different
media types; have specific relevance to news, and are publishable.
Examples of NewsLines that have been identified as likely to
be widely used and so specifically identified as elements are:
HeadLine; SubHeadlines; ByLine; Date-Line; CreditLine; CopyrightLine;
RightsLine; SeriesLine; SlugLine; and KeywordLine. Use of these
NewsLines is optional and each NewsLine can only be included
once in a NewsComponent.
News Management
Often, news providers need to modify a news object,
which they have previously sent to a
customer. For example, they may correct a headline, expand upon the body of
a story or delete a
piece of news altogether. This process of updating, deleting and modifying
is known as “news
management”. Different news providers may have different news management
policies. IPTC’s
NewsML standard provides sophisticated means for providers and their customers
to implement a
variety of procedures.
|