Semantic Web - Knowledge and Services

The Semantic Web - a Web of Knowledge and Services

Hans-Georg Stork, Bollendorf

In less than ten years the World Wide Web, based largely on HTTP and HTML, has evolved into a vast information, communication and transaction space. Needless to say its features differ greatly from those of traditional media. Yet many believe that today’s Web gives us just an inkling of the full potential globally distributed systems may achieve in terms of information access and use. Realizing this potential will turn the Web into an equally vast knowledge and service space.

Such prospects are being nourished by ideas that have been looming for a number of years and whose objective is currently being referred to as the "Semantic Web". In the past, some of these ideas have been implemented and tested to a greater or lesser extent in open experimental or proprietary commercial systems. Yet it is probably safe to say that they have received their greatest push from the World Wide Web Consortium (W3C)[1]. An informal paper published on the Web in September 1998, by Tim Berners-Lee, entitled "Semantic Web Road Map" [2], and a more formal note on "Web Architecture: Describing and Exchanging Data" [3] (June 1999) may be considered the seminal documents.

"Making content machine-understandable" is a popular paraphrase of the fundamental prerequisite for the Semantic Web. This is to be taken very pragmatically: content (of whatever type of media) is "machine-understandable" if it is bound (attached, pointing, etc.) to some formal description of itself (often referred to as metadata^*).

Ideally, "adding semantics to content" in this sense should be achieved through algorithmic content analysis and/or algorithmic learning processes.

"Machine understanding" is not an end in itself. Rather, it should lead to automating a range of tasks within the context of distributed systems (such as the Web): from (chains of) business transactions to searching and filtering relevant and trustable information on whatever subject a user may be interested in. The kind of software performing such tasks is commonly known as "agents", decorated with varying attributes and qualifications, such as information, intelligent, autonomous, co-operative, adaptive, rational, mobile, etc.

Lastly, human users should be able to interact with their agents (or directly with content) in an intuitively appealing fashion. Visual and/or virtual reality metaphors are among the likely candidates for representing the semantics of Web content at the man-machine interface (to make, in a manner of speaking, machine-understandable content understandable to humans) and for providing new ways of navigation and search.

Technologies enabling this evolution towards a "Semantic Web" draw on various Computer Science sub-disciplines such as formal modelling, logics and languages, information retrieval, (multimedia) databases, knowledge engineering, image analysis, etc., but also on neighbouring disciplines such as Cognitive Science. Interoperability (e.g. among information agents) is an important issue and justifies a strong case for agreeing on and/or adopting Web wide standards. Generic tools are needed as well as demonstrations of new concepts through viable commercial applications.

The past rapid growth of the World Wide Web itself stimulates and motivates R&D opening up new ways of processing and managing all kinds of digital content (including images, video and audio but of course also text and plain data), and its delivery via stationary and (increasingly) mobile platforms. However, in order to give these developments even greater momentum it is necessary to create synergies between hitherto relatively separate R&D (and standards) communities, both in industry and academic/public research.

Formal ontologies seem instrumental in achieving the "Semantic Web". Rooted in a long tradition not only of formal logics and artificial intelligence but also of more mundane endeavours such as the setting up of classification schemes, thesauri and controlled vocabularies, they are currently the most promising candidates for a sound semantic ground of machine-processable descriptions of digital content. Making ontologies operational within the context of large distributed systems (such as the Web) requires a considerable research and development effort to be directed towards methods and tools for constructing and maintaining domain specific ontologies in a continously changing world. Key problems include: ontology learning, ontology-based annotation of legacy content, and the management of ontology repositories. Also required are agreements on ontology language standards, necessary conditions for creating a sustainable ‘Semantic Web’. Pertinent activities have already been launched at the W3C, with the participation of European, US and Japanese groups.

Mobile platforms, or the "wireless Web", pose particular demands in terms of content semantics, largely due to the perceived usage patterns and a multitude of different capabilities of mobile devices. They require the matching of a variety of profiles, not only related to the user herself, but also her current situation/location, the services she needs, the terminal equipment at hand, the proxy server and whatever policy may apply in these circumstances. Ontologies for expressing these profiles are badly needed.

The next big challenge then is multimedia on the "Semantic Web". The current Web, in spite of many attempts to popularize audio and video streaming, is not really hospitable yet to these forms of content. The current physical infrastructure of the Internet does not yet support high-bandwidth applications that can compete with traditional audiovisual media. However, this is very likely to change over the next five to ten years. Multimedia producers and distributors will then be faced with a formidable management task (a "multimedia engineering crisis" in analogy to the "software crisis" of the sixties and seventies) while users (or consumers) of content will have to choose from an ever increasing number of information, education, training or entertainment products. Both problems can only be solved through the systematic use of metadata. This is yet another case in point for advancing research on appropriate ontologies. It is also a case in point for encouraging even closer co-operation between the RDF/ontologies world(s), the SMIL world(s) and the MPEG world(s). And thirdly, given the sheer size and dynamics of the contents involved, it is a case in point for automating to the largest extent possible the production of metadata through algorithmic content analysis.

So the future "Web" will be a multimedia web and it will be used by mobile users; it will be adaptive and open, collaborative and automated. To that effect it has to be "semantic" of course! It will be pervasive and browsers will not play the role they play today. Agents, i.e. communicating distributed processes, acting on behalf of human users, will play a far greater role. They have been around - at least in the literature - for more than five years, and they come in basically two varieties: as personal information assistants and as members of multiagent systems. Again, for agents in a distributed system to be effective, ontologies and ontology-based metadata are indispensable. Agents enter the "Semantic Web" at different levels. As much as they make use of the ‘semantic infrastructure’ they can also contribute to the creation and maintenance of that infrastructure. Agent based computing appears to be the appropriate paradigm to work in a complex world with multiple ontologies, fragments and multiple inferencing engines. It is interesting to note here that the agent and metadata aspects establish a strong link between the "Semantic Web" and another important initiative in the field of distributed computing and research networking, known as the "Computational Grid" (cf. the DataGrid project [4]).

Business opportunities related to the notion of a "Semantic Web" seem to abound. First of all in the traditional areas of selling (B2C = Business to Consumers) or trading (B2B Business to Business) goods over the Internet. While ‘traditional’ B2C and B2B are still very much (product-)data and text (and of course image) oriented this will change as the Web becomes more and more multimedia enabled, making already complex content management tasks even more complex and requiring solutions based on "Semantic Web" technologies. XML alone is not a panacea. Unlike today, with most content still being available for free, content itself will be a commodity in a future Web, subject to both B2C selling and B2B trading. In order to be on the winners’ side all parties involved in these games will have to rethink their approaches and strategies. Content providers for instance will have to understand the benefits obtained from the systematic generation of metadata; service providers will have to accept metadata as the basis on which to build new services; and the producers of software tools for end-users will redirect their imagination towards more appropriate integration of application software with Web content, taking advantage of metadata.

This reminds us that technologies must not be developed for the sake of developing technologies. They should respond to real needs and they will be successful (commercially and otherwise) only if they do so. "Semantic Web" technologies appear to meet this requirement, without being committed to any single application domain.

The content technologies addressed under the heading "Semantic Web" will be crucial for the development and use of digital content in networks whose physical infrastructure is becoming increasingly powerful in terms of bandwidth and processing speed; they will be crucial for turning the current Web into a network of knowledge resources and services based on expedient links between knowledge resources. In fact, the "Semantic Web Technologies" alluded to in this note, can be considered core "Knowledge Technologies".

References:

[ 1] http://www.w3.org >back

[ 2] http://www.w3.org/DesignIssues/Semantic.html >back

[ 3] http://www.w3.org/1999/04/WebData >back

[ 4] http://grid.web.cern.ch/grid >back