Intelligent Multimedia Interaction

Mark T. Maybury
Advanced Information Systems Center
The MITRE Corporation
202 Burlington Road
Bedford, MA 01730, USA

Governments, industry and academia have increased their focus on the importance of the human machine interface in the global information economy. More effective, efficient and natural human computer or computer mediated human-human interaction will require automated understanding and generation of multimedia and will rely upon precise information about the user, discourse, task and context (Maybury 1993).

This invited talk will begin by briefly outlining the history of and advances in the area of intelligent multimedia interfaces including multimedia input analysis, multimedia presentation generation, model based interfaces, and the use of user, discourse and task models to enhance interaction.

The talk will describe our research to provide users with intelligent interfaces which reason about and exploit tasks models and models of user focus of attention to mitigate application and domain complexity through such means as tailored presentation design and cooperative responses. Through a video demonstration, I will show an early intelligent multimedia interface that incorporates language processing, simple user and discourse modeling, and visualization to improve the timeliness and accuracy of information access from the web (Smotroff et al. 1995).

The talk will then describe architectures that have evolved from research in intelligent user interfaces over the past twenty years (Sullivan and Tyler 1991; Maybury and Wahlster 1997) and distinguish these from conventional commercial user interface architectures.

The presentation will conclude by pointing out current work in progress that aims to fully instrument the interface and build (automatically and semi-automatically) annotated corpora of human-machine interaction. We believe this will yield deeper and more comprehensive models of interaction which should ultimately enable more principled interface design.

Time permitting, we will also overview our current, ambitious effort to create algorithms to segment, extract, summarize and visualize broadcast news in MITRE's Broadcast News Navigator (Maybury et al. 1997). This exemplifies an emerging class of applications that support content-based retrieval of multimedia (Maybury 1997). The talk will conclude with comments on the future of intelligent human computer interaction.


Kobsa, A., and Wahlster, W. (eds.) 1989. User Models in Dialog Systems. Berlin: Springer-Verlag.

Maybury, M. T. (ed.) 1993. Intelligent Multimedia Interfaces. Menlo Park: AAAI/MIT Press. (

Maybury, M. T. (ed.) 1997. Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press. (

Maybury, M., Merlino, A., and Rayson, J. 1997. "Segmentation, Content Extraction and Visualization of Broadcast News Video using Multistream Analysis". In Proceedings of the AAAI Spring Symposium, Stanford, CA.

Maybury, M. T. and Wahlster, W. (eds.) 1997. Readings in Intelligent User Interfaces. Morgan Kaufmann: Menlo Park, CA.

Smotroff, I., Hirschman, L., and Bayer, S. 1995 "Integrating Natural Language with Large DataspaceVisualization," in Adam, N. and Bhargava, B. (eds), Advances in Digital Libraries, Lecture Notes in Computer Science, Springer Verlag.

Sullivan, J. W., and Tyler, S. W. (eds) 1991. Intelligent User Interfaces. Frontier Series. New York: ACM Press.