< Texts on topics cikon.de download pdf (161K)

Notes on digital content

(These notes build on earlier notes on "Multimedia Content and Tools"; there are, however, no links here, so far ...)

see also: A Brief Note on Knowledge Technologies (no links either, sorry    :-(   )

  1. Digital content has many aspects, facets, dimensions: contextual (in terms of purposes, applications and domains), technical, economic, linguistic, social, cultural, aesthetic, ethical, legal, political, to name but a few. This note focuses primarily on (some) technical issues.

  2. It does in particular not address the "driving forces" behind digital content and its associated technologies (e.g. "who profits in what way from what?", "who needs it for what?", "who establishes requirements?"), or ways technical solutions can help resolve non-technical issues (such as IPR/"digital rights", multilinguality, data protection, content preservation or social control of digital content).

  3. We speak of digital content as opposed to analogue content. Digital content is anything that can be produced/created, stored, processed, managed and transmitted using digital technologies. (Analogue content has - at least in principle - nothing to do with these technologies: it used to be produced/created, stored, processed, managed and transmitted, long before (in most OECD countries) digital computers became ubiquitous office equipment, and digital cameras, CD and DVD players, playstations etc., became household appliances. We note, however, that digital technologies are also having a great impact on (at least the production and distribution of) analogue content.)

  4. This may be - to all intents and purposes - slightly too broad a "definition". We exclude software engineering products in so far as they drive the machines that process content and make it accessible (among many other tasks supported by software). We include above all databases, electronic texts, images and graphics, audio and video, or - more generally - "multimedia content". But we also include software engineering products (e.g. programmes and specifications) in so far as they implement computational services and services in processing and accessing content (including themselves (!) - as pieces of text in "software engineering environments", for instance).

  5. More specifically, we may "define" digital content as: [[structured] collections of] digital representations of abstract or real-world objects and processes, and/or of knowledge thereof; objects and processes are either natural or man-made. (Unfortunately, the terminology is currently somewhat blurred as many "content providers" and manufacturers of "content management systems" seem to adhere to too narrow a view of the concept, e.g. limited to the context of "digital rights management".)

  6. Our definition subsumes "multimedia". In fact "multimedia content" is usually understood as subject to a number of restrictions, such as:

    • to form part of multimedia content a digital representation of a real-world object must preserve some or (in the best case) all sensory features (in terms of seeing, hearing, smelling, tasting and touching) of that object;

    • to form part of multimedia content a digital representation of an abstract object (e.g. data sets, information spaces) must allow presentations that appeal directly to one, several or all of the human senses.

  1. More often than not the objects underlying digital representations are man-made and themselves analogue representations (e.g. images, text, sound, etc.) of real-world objects. (Strictly speaking, "text", as a concatenation of characters drawn from a finite set of symbols, is itself a digital object type; hand written text, however, clearly includes a strong analogue component.)

  2. Whereas storing and communicating analogue representations of (real-world) objects require a range of different physical media (e.g. paper, film, air (!), magnetic tape, electromagnetic fields, etc.), digital representations can in principle be stored on and communicated via a single physical medium respectively. (Hence "multimedia" may be considered a misnomer; it is actually the replacement of several (multi) physical media by (potentially) a single one, that the term 'multimedia' alludes to.) This is what "convergence" is all about (at least technically speaking).

  3. Digital content - and multimedia content in particular - play eminent roles in the context of, for example,

    • business and public administrations,

    • science, engineering, medicine, law and (many) other professional occupations,

    • education and training,

    • (general and specific) information services,

    • workflow (including CSCW) and transaction oriented (e.g. for e-commerce) systems,

    • preservation of and access to intellectual and cultural assets and resources (in public and private "memory institutions / facilities");

    • entertainment (games, interactive digital TV, etc.).

  1. Digital content (as all forms of content) may or may not represent explicit knowledge about real world phenomena (things, processes, etc.). (In an education/training environment, for instance, digital content is likely to represent some explicit real-world knowledge whereas this may not necessarily be the case with content underlying the odd action game.) In any event, however, it is always possible to have knowledge about (digital or non-digital) content. It is usually descriptive in nature. Formal content description, representing knowledge about content, is indeed an important form of digital content. We refer to it as "meta-content" (as opposed to "domain content") which includes "metadata" as well as the (formal) documentation of whatever "mini-world" is needed to give these metadata meaning (e.g. ontologies).

  2. A possible (non-exhaustive) list of (classes of) operations on digital content could be structured along the complementary views of content as either output or input, and the "domain content versus meta-content" dichotomy:


    content as output

    content as input


    creation, manipulation, search, access, retrieval, communication, presentation

    description, storage, management, manipulation, annotation, analysis


    creation, description, manipulation, annotation, analysis

    management, search, access, retrieval, presentation

    An operation can appear in different quadrants. "Annotation", for instance, would take "domain content" as input and output "meta-content"; "presentation", to take another example, can be based on "meta-content" to visualise "domain content".

    The allocation is of course fuzzy. None of the above (classes of) operations can be seen in isolation. Their application is usually subject to some sequential or hierarchical order (if sequential it is often called "life cycle"). They are of different degree of complexity, and there is substantial overlap between the (mainly technical) issues pertaining to individual operations. A more detailed list of such issues is annexed to this note.

  3. Tools - in general both hardware and software - are needed to perform these operations. They can be grouped in corresponding classes. Tools are either generic or application specific: different application areas and different applications within these areas may impose different requirements on the specificities of the various above listed types of operations.

    The particular features of authoring tools for instance, depend on the type of content to be created: requirements for authoring a set of courseware modules are likely to be quite different from those for compiling an interactive multimedia newspaper or for producing a video clip.

    Yet, the basic features of these operations do provide a common ground for most if not all contexts of digital content. They are largely independent of any given application domain.

  4. Technical challenges regarding digital content and associated tools stem mainly from the evolution of basic digital technologies, characterized by ever increasing values of parameters such as processor speed, storage and memory capacity, bandwidth and connectivity. This evolution has greatly facilitated the emergence of specific "technologies for creating and using digital content". These have brought about:

    • the (well known) quasi-explosive increase in digital content production (using tools that are many orders of magnitude more powerful than pen, paper, the printing press or library catalogues; in fact, most of what used to be given "analogue" form in the past - e.g. sound, still and moving images, speech, etc. - is now available as "digital content") as well as distributed (world-wide) platforms for the management and use of digital content;

    • a considerable enhancement of our ability to analyse what is going on in the world (in both nature and society), to peruse vast amounts of data, searching for structure, thus refining our models of the world; and - partly as a consequence - machines/agents that can learn and - to a certain extent - act autonomously in limited formal environments.

A key technical challenge consists in building on these developments with a view to creating tools and systems that would make operations on digital content ever more effective and efficient, and its use ever more enjoyable (for instance by allowing a higher degree of interaction).

As insinuated, this challenge is persistent. Its target is moving.

There is a "technology - applications" cycle. Applications pose challenges; technologies are developed in response to such challenges and may make new applications possible, desirable or necessary. That cycle takes societal needs as input and yields products and/or services as output.

  1. Presumably the hardest (and hence most challenging?) problems, from a technical point of view, are those related to content analysis (or, for that matter, the analysis of the real world, the ultimate source of content). (In a nutshell: such analysis aims to derive from - more or less - raw data or signals something intelligible and actionable - in the form of ontologically grounded metadata, for instance.) From an organisational (that is non-technical) point of view the hardest (but perhaps no less challenging) problems are probably those related to access to and use of digital content (e.g. in the above mentioned contexts, cf. 10).

  2. There are a number of general architectural issues related to systems dealing with digital content. They include: heterogeneity, distributivity, federation, scalability, interoperability, sustainability and commercial viability of content repositories / systems; (multimedia) content production systems; embedded multimedia (e.g. background multimedia libraries), human-centred design techniques, etc.

  3. Adherence to standards in the design and implementation of systems based on digital content is a necessary condition for (inter alia) interoperability, sustainability and commercial viability. Standards issues arise in connection with several of the above classes of operations, in particular creation (representation formats and languages), description (metadata, identifiers, languages, semantics, ontologies) and communication (protocols).

  4. Summary:

    • The scope of (the "abstract" notion of) digital content ranges from "domain content" (object and process representation) to "meta-content" ("knowledge representation"); there are many application contexts;

    • Specific technologies underly tools for working with digital content; tools have generic and application dependent features;

    • The hardest problems include: (1) content/real-world analysis; (2) sustainable access to and use of digital content; (3) social control of digital content;

    • These problems are likely to persist - with solutions emerging from the "technology - application cycle" (with "needs" as input and "products / services" as output).

Annex: Some technical and non-technical issues related to operations on digital representations and collections



representation formats and languages, digitization techniques, editing / authoring / production, protection (e.g. watermarking and encryption), digital representations of not yet commonly handled objects and features (related, for instance, to senses other than seeing and hearing)


(there is a strong link between description related issues and those related to other classes of operations, particularly analysis, search and retrieval), classification, metadata, metadata derivation and tools, content semantics, knowledge representation, ontologies, related (formal) language issues


distributed storage, near-line storage architectures, caching, I/O bandwidth resource management, OS support for realtime and non-realtime data, compression techniques


multimedia data models, distributed object management, multimedia fragments management, temporal and spatial databases, multimedia data warehousing, guaranteeing consistency, integrity and authenticity


code transformation, texture mapping, image perspective transformation, feature enhancement, etc...


video annotation and summarization / abstracting, semantic annotation etc...


data mining / knowledge discovery in multimedia databases, document analysis and understanding, image analysis, pattern recognition, feature detection (e.g. object motion detection, tracking and characterization; face detection and tracking; understanding acoustic signals; speaker identification and tracking; emotion detection - 'kansei'; ... for content based retrieval...)


resource discovery, meta-search engines, filtering and selection, concept based browsing, content based querying, navigation


indexing methods (e.g. spatial indexing, content based indexing), access control (including control of access to illegal and harmful content), rights management, user privacy


multimedia retrieval models (including network and hypermedia models, Web based multimedia), content based ('intelligent') retrieval


design and implementation of multimedia communication protocols, quality of service, real-time streaming / synchronization, multimedia over the Internet, mobile multimedia, multimedia via satellite, multicasting and security


content personalization, man-machine interaction models, non-standard interfaces (e.g. 'immersive content', multimodality), interface agents, virtual reality / interactive simulations, visualisation techniques, etc...

Copyright: Hans-Georg Stork