A Brief Note on Knowledge Technologies

A Brief Note on Knowledge Technologies

by Hans-Georg Stork, Luxembourg

Issues and challenges

Knowledge is a prerequisite for acting purposefully in a given environment or domain (decision making, planning, collaborating, etc.). Knowledge is always about something: objects, processes, phenomena, etc.

In order to be amenable to "automated (i.e. computer-implementable) solutions" knowledge must be formally represented. Computer-based representations of knowledge about objects and processes (digital or not) capture, to a certain extent, the semantics of these objects and processes (including e.g. people, papers, articles, reports, books, recipes, databases, still and moving images, graphs, product and service descriptions, etc.).

Adding (automatically or interactively) explicit semantics to (static) content, services and processes (and thereby producing knowledge representations) is one of the key functions of Knowledge Technology tools. They (help to) generate and record the "meta-knowledge" (about digital content of all sorts) that makes all other forms (such as scientific and scholarly papers) of knowledge (and often nonsense) more accessible and usable.

Content-/service-providers and end-users need to be made aware of the benefits of adding explicit semantics to content. This is largely a "critical mass" or "chicken and egg" problem: adding semantics to content (and services) does not pay off if no tools are available to make good use of it while developing tools does not pay off if there is little semantically-enriched content to work on.

Open source development of suitable software geared in particular towards very large scale open distributed systems for knowledge management and use, on platforms such as the World Wide Web, may be a viable ("piecemeal engineering") approach to achieving this critical mass. These developments could address simple applications, of interest to as many people as possible, such as personal information management (PIM) or e-publishing (scientific and scholarly preprints), demonstrating the value of adding explicit semantics to content.

"Acting upon semantically-enriched content (including service descriptions)" refers to a second important class of functions to be provided by Knowledge Technology tools and artefacts.

In order to make the most of semantically-enriched content in open distributed systems agents must "understand" each other, they must be interoperable. This must be guaranteed at all levels: syntactic, semantic and pragmatic. Solutions to interoperability problems require specific models and tools (e.g. for mappings between representation formalisms/standards) and may also need specific organisational underpinning.

Interoperability is indeed only one (yet important) aspect of the quality of knowledge representations and tools. Other aspects include scalability and usability. To ensure the overall quality of such artefacts a coherent and well organised process of test, evaluation, assessment and certification is desirable.

Strong business cases for the cost-effective use of Knowledge Technologies in corporate and/or commercial environments must be further elaborated. Specific examples include

workflow and collaboration support,
proactive portals for community building,
retrieval, filtering, profiling and recommender systems, as well as
document change and innovation management.

Trustability, privacy and other social issues should be taken into account at both the design and operation stages of knowledge-based systems. Target areas such as "customer relationship management" (CRM) or "e-government" are particularly sensitive in this regard.

Knowledge Technologies draw on various Computer Science sub-disciplines such as formal modelling, logics and languages, information retrieval, (multimedia) databases, image analysis, cognitive vision, etc., but also on "trans-disciplines" such as Cognitive Science. Therefore R&D projects addressing Knowledge Technologies will necessarily be multi-disciplinary.

R&D areas

R&D areas can be categorized broadly as pertaining to (i) adding explicit semantics to content, services and processes or, (ii) acting upon semantic descriptions (cf. above).

(i) Adding explicit semantics to content, services and processes

As explained in the previous section, knowledge about content, services and processes is made explicit through formal descriptions of such entities. These descriptions are usually referred to as metadata. However, while formal description is necessary for agents to act upon, it is not sufficient. Descriptive terms are "understandable" (or "meaningful") only if their meaning has been defined somehow, somewhere. This is usually done through ontologies which provide meaning that can be operated upon. They embed terms in contexts (of other terms) and/or stipulate rules a given term (or set of given terms) must obey.

Metadata and ontologies demarcate broad and inter-related research areas. Pertinent problems and solutions depend largely on content types and usage environments. Problems include:

Metadata and indexing

metadata extraction / capture
semantic annotation
domain-, context-, user- and task-oriented indexing
semantic indexing of multimedia content

Ontologies

ontology construction (including ontologies for multimedia objects) and management ("knowledge lifecycle" support)
ontology learning

There are various classes of technologies and/or approaches likely to provide generic solutions, for instance:

data/text mining for "knowledge discovery" in data bases or large text repositories
concept detection and fact extraction
machine learning (e.g. for automatic classification)
semantic analysis of audiovisual content (segmentation, object extraction, etc.)
speech, face, gesture and emotion recognition (e.g. through natural language analysis and cognitive vision)

While it would be preferable to automate completely processes of "knowledge acquisition" in or for knowledge-based systems, some elicitation component (e.g. for capturing knowledge through targeted man-machine interaction) will most likely always be required. Adaptive and context-sensitive techniques are needed. Corporate knowledge systems for instance should provide enjoyable means of interactive knowledge capture on the fly, directly from workflow processes.

In large distributed systems (such as Internet-, intranet- or extranet-based webs) knowledge creation and management pose particular problems and challenges. Specialised services (see also subsection (ii), below) could be offered to support both knowledge acquisition and knowledge elicitation (e.g. for semantic annotation of content). Peer-to-peer (P2P) networks (generalizing the by now classical client-server configurations) would lend themselves to implementing methods allowing semantics to "emerge" from node-to-node interaction.

Given the sheer amount of content in global distributed systems solutions to some of the above problems (e.g. semantic annotation, multimedia content analysis, etc.) may require powerful computing resources as provided for instance by Grid computing technologies. Grids may in fact contribute all kinds of compute-intensive services to be offered through webs (see below).

(ii) Acting upon semantic descriptions

Semantic content (service, process, ...) description (based on suitable ontologies) enables software agents in distributed systems to co-operate and to perform complex transactions and other operations (such as searching, filtering and integrating information), on their users' behalf and without extensive user intervention. Semantic descriptions relieve implementers of the burden of "hard-coding" semantics in agents and thus contribute to achieving interoperability. They can also help human agents ("users") to make sense of and interact with content, services and processes in distributed systems.

These general comments imply a wide range of challenging and generic R&D topics.

Some of these topics can be subsumed under the headings "Semantics- (or knowledge-)based services" and "service semantics". Services are of particular relevance in distributed systems such as the World Wide Web. Indeed, the term "Semantic Web" usually refers to the formal framework (in terms of models and languages) needed to provide agents with semantic (i.e. ontology-based) descriptions of all sorts of web-addressable entities (including services), allowing for instance context-sensitive service discovery, mediation and composition. These agents would also build on reasoning/inferencing capabilities that ontologies make possible. However, scalability of ontologies, ontological reasoning and ontology (change) management remain serious problems.

The "interfacing with knowledge" aspects (i.e. making content, services, etc., accessible and intelligible to people) are equally intriguing. Subjects falling into that category and allowing for largely generic R&D include:

semantics based navigation and browsing
semantic search engines with domain-, context-, user- and task-sensitive query construction support
knowledge-(viz. semantics-)based dialogue management
semantic Web portals and collaboration support
user profiling, personalisation, customization (e.g. through particular "views on knowledge")
visualizing knowledge
device dependent interfacing.

Research has been conducted on many of these and similar subjects for quite some time (with or without the "knowledge" or "semantics" qualifier); yet they could benefit from new knowledge-based approaches. And they become ever more important and more challenging as the global (wired and wireless) networks increase their reach both in terms of capacity and physical access modes, evolving into an "ubiquitous permeable web", necessitating "semantics for everything".