User-Modelling in Artificial Intelligence and Human-Computer Interaction

by Jim W. Lai
Copyright ©1993. HTML conversion/editing Copyright ©1998,2000.

Introduction

User-modelling has had a long history in both the field of artificial intelligence (AI) and the multi-disciplinary field of human-computer interaction (HCI). AI, regardless of the criticisms of its accomplishments and goals, has contributed to many significant advances to HCI. In turn, HCI poses many problems which may be of interest to the AI community and offer a new perspective on existing problems.

In Section 1 will present a brief and selective survey of developments in the field of artificial intelligence. We are primarily interested in developments which require user-modelling and have consequences for HCI.

We then take a glimpse at what has been accomplished in human-computer interaction in Section 2. Another brief and biased survey of the many ways that humans can interact with computers is given, followed by a discussion of the need for user-modelling in HCI.

Finally, in Section 3, we view current trends in HCI involving user-modelling and AI techniques, and speculate on where they might lead and look at the possible impact on AI.

1. A Very Brief History Of Artificial Intelligence

Early Beginnings

ELIZA

One of the first major advances was ELIZA. Programmed in LISP by Joseph Weizenbaum, this program would appear to the user to be able to engage in a dialogue imitative of the style favored in Rogerian psychotherapy. In 1966, the program was able to successfully emulate human conversation to a degree that humans often assumed they were communicating remotely over teletype with another human.

Whether this program is considered to be artificial intelligence is debatable. However, ELIZA's technique of responding to keyword-matching demonstrated the plausibility of natural language understanding by computers.

No attempt was made to model the user; in fact part of reason Weizenbaum chose the Rogerian model was that context was not important. This lack of context resulted in sometimes utterly inappropriate responses on the part of ELIZA when the program was unable to "understand" the user.

This flaw pointed out the need for a robust program to comprehend the user's input on a deeper level and to maintain a context for the conversation, in order to better handle misunderstanding of the user's input. Both these concerns were the subjects of later research.

PARRY

In 1963, K.M. Colby published a paper on "Computer Simulation of a Neurotic Process". A notable result of Colby's work is the program PARRY, which graced the pages of journals starting in 1971. PARRY was an attempt to model human paranoid belief systems, and was based on several psychological theories of paranoia.

In a multi-stage experiment, psychologists reliably judged PARRY's interactive output as being paranoid and were unable to distinguish transcripts of a session with PARRY from that of a session originating from a human patient. This was a notable advance in the simulation of human behavior.

The Field Yields Harvest

Expert Systems

Expert systems are software tools that attempt to model some aspect of human reasoning within a domain of knowledge. As a rule, expert systems rely on human experts for their knowledge, at least initially. An early success in this subfield was MYCIN [SHO1], developed in the early 1970s under Edward Shortliffe. The majority of current expert systems are structured similarly to MYCIN [BOY1].

The MYCIN system incorporates domain knowledge of infectious diseases in the form of facts and inference rules. A physician can interactively enter data on the patient, study MYCIN's lines of reasoning (inference) in making a diagnosis, and correct MYCIN if necessary. MYCIN can also suggest treatment. The authors of the system claim that MYCIN gives results that agree with experts 75% of the time.

The problem of knowledge representation in expert systems can be viewed as a limited form of modelling humans, i.e. the modelling of human knowledge.

The creation of an expert system raises the issues of knowledge acquisition and incremental learning. Careful design is required to reduce the effort required to transfer knowledge, as well as to produce an efficient representation. Ideally, experts should state explicitly all steps used in their reasoning.

In addition, the issues of human factors and user interfaces cannot be ignored. The expert must be consulted on the design of the interface. Humans prefer to see quick results, and interest must be maintained during creation. An ideal system would be easy to use.

Computer-Aided Instruction

Computer-aided instruction (CAI) has its origins in the 1960s. These systems were designed to tutor students (users), thus augmenting (and perhaps substituting for) human teachers. Initially, such systems were inflexible and had very restricted domains. It could take great effort on the part of educators to create educational packages on these systems; it was up to the package designers to determine the manner of presentation as well as the content of the lessons; the PLATO CAI project ran into unexpected delays in producing course material, as reported in 1973 [AHL1]. "...the original estimate of about 40 man-hours of effort per student-contact-hour of courseware [may be] off by a large factor..."

Aside from the problem of knowledge representation, the issue of how best to transfer the knowledge to the student rises. For instance, a tutoring system should not overwhelm the student with possibly unfamiliar terminology, nor should it attempt to teach a topic which the student is unprepared for. The system should present the information in a manner and rate that is readily assimilated by the student. To best accomplish this, the application should model the student's mental state. The application of AI techniques for student-modelling led to the development of intelligent computer-aided instruction (ICAI) in the 1970s.

Some of the areas of ICAI research involve plan recognition in tutoring systems, discovery of student misconceptions (required context for correction), self-adaptive interfaces (adapting the degree and type of help depending on the perceived expertise and goals of the user), and modelling of programmer intention (used in discovering programming bugs). Modelling of the user's beliefs and intentions is a significant component of AI, as well as ICAI. Many of these areas are also of interest to those in the field of human-computer interaction.

Natural Language Processing

Natural language processing (NLP) has been a major area of research in AI ever since the early 1960s. The concept of a natural language interface (NLI) was a natural consequence of NLP research.

Research efforts in natural language understanding (NLU) led to the study of natural language text generation. Two related subfields are machine translation and voice processing. The latter includes voice recognition and voice generation.

World-modelling and user-modelling are important concerns of NLP, as semantic understanding, a knowledge of discourse strategies, and context are required for a robust interaction.

2. Human-Computer Interaction: The User-Interface

An Introduction

Human-computer interaction (HCI) is a multi-disciplinary field, spanning computer science, psychology and cognitive science, engineering, and art. HCI had its origins in the 1950s, though it was not recognized as a field until relatively recently.

A premise of HCI is to make communication take less effort for the user. The logical conclusion of this premise is the "transparent interface", where the user is not consciously aware of using an interface. To this end, mental models of the user must be taken into account; some tools may benefit greatly by being enhanced in such a manner, as was seen in the previous section.

There are many problem areas in HCI where AI can be applied. AI has historically been a field strong in theory, while HCI is more an art than a science. HCI can benefit strongly from the formalism that AI offers, while AI may gain insights from the richness of interaction in the HCI domain. HCI can also be a proving ground for testing concepts in AI; often systems will be implemented even though the cognitive and formal bases are not fully developed.

A brief survey of the history of HCI may help give fresh perspective to those who have suffer from NLP "tunnel vision". With this as a background (or context), we can then discuss the impact of user-modelling.

Standards As Implicit User Models

In the early days of computing, standardization of computer languages provided the advantage of having a consistent and specified means of interaction. The aim was to provide a "language" for human-computer dialogue that was more amenable than low-level machine code. Two early examples of such standards are FORTRAN (1950s-1977) and COBOL (1960-1974).

The implicit user model in these standards is simply that the user knows the correct syntax of the standard language; the user model is unchanging. While the resulting program was less efficient than machine code and required more computer time to translate and process, these languages became popular because of the relative ease of use they allowed. Because it now took less effort for the human user to generate a program, more complex programs could be written more easily.

More To Communication Than Text

Research in artificial intelligence has been primarily based on text- and speech-based interaction, due to the focus on natural language processing. Humans communicate in terms of speech before they learn to communicate in text. Text was originally developed as a means of encoding speech. In addition, humans start off communicating with the world via a variety of sense. The cognitive consequences of this should not be ignored. It may be that study of the development of communication in humans may lead to theoretical and practical advances in the construction of computer systems capable of robust communication.

One should also keep in mind that human-computer interaction has taken many forms other than natural language in the last thirty years. A look at what has been accomplished in these other forms of interaction may offer valuable insight.

Sketchpad

In 1963, Ivan E. Sutherland developed Sketchpad, which allowed the user "to converse rapidly though the medium of line drawings" [SUT1]. The interface was primarily graphical, though it also employed gesture (pointing) to indicate what the user sought to accomplish. The system would update the drawing rapidly in response, giving the impression of direct manipulation. This can be viewed as being analogous to backchannels [TAN1] in speech (or secondary speech), where the utterance is cooperative and nonintrusive and may thus be simultaneous with primary speech. The implicit user model here is very simple and quite basic to human communication: gesture and response.

Star

The Xerox Star System, developed in the 1970s, presented an entire programming environment, a major advance from earlier systems of computer interaction. Gesture is input via a "mouse" device in conjunction with text from a keyboard. Bit-mapped graphics are used for the monitor display; graphics allowed for greater capability in representation than the text-only teletypes and video terminals commonly in use at the time. (Text is a specialized form of graphics.) This allowed for the logical extension of the communication concept seen in Sketchpad, which is the reactive pictorial interface, now known as WYSIWYG (What You See Is What You Get).

Major components of this system, including the "desktop metaphor", are present in the Macintosh interface [SMI1]. Among the elements of this presentation style are icons, graphics designs which symbolize files and other components, physical and logical, which can be manipulated by the user. The methods of interaction were carefully chosen to reduce the cognitive workload on the user, such as seeing and pointing to specify operations to the system, and allow easy familiarity with the (context of the) system, such as generic commands for modeless interaction and reliance on easily-understood concepts.

Paint And Motion

The first computer-based "paint" program was developed in the mid-70s. This can be seen as a specialized form of the reactive pictorial interface, where the computer screen is used as a graphic medium for manipulation. The language of interaction involves gesture via a pointing device and specialized operations to be applied. Here, the implicit user model is centered around a task, computer rendering. Further extensions to this model have resulted in computer-based animation packages.

Computer paint systems (1976?-) provided interfaces to simplify the selection of color, gradation, and texture. In the future, context-sensitive tools may give the user hints on how to achieve desirable effects for painting and animation.

Voice

Voice processing, speech recognition in particular, has also been of interest as a means of HCI and is a prime example of where AI has made a direct contribution. Voice as input is attractive since users would, ideally, be able to communicate with the computer directly in natural language instead of having to transmit their intents via some artificial descriptive programming language. Combined with speech synthesis, a human and a computer would then be able to converse. The implicit user model is then simply that of a speaker and listener.

Voice recognition technology has been actively worked on since the 1970s, e.g. Hearsay. Expert systems for speech recognition have been developed, but vocabulary size and disambiguation remain fairly challenging obstacles. The preference for realtime recognition also poses a technical limitation.

Applying NLP, Hearsay-II (1976?) used symbolic reasoning in conjunction with acoustic and linguistic knowledge to improve accuracy. Since then, several expert systems for speech/voice recognition have been developed. A robust vocabulary size remains a challenge. A voice interface is made possible when voice recognition is combined with with speech synthesis

Multimedia

Multimedia in the sense used here is simply communication involving several media, often in the context of the communication as presented by the computer. A multimodal interface is able to accept input in multiple modes, such as speech and gesture, and apply the information gained from each mode to the task at hand --- understanding the user.

Recent research in this area has been made by the Intelligent Multi-Media Interface Project, which is devoted to highly integrating speech, NL text, graphics, and pointing gestures in a flexible, context-sensitive manner. For example, in 1988, CUBRICON (a part of the IMMI Project) accepted user input in the form of natural language simultaneously combined with pointing gestures.

Virtual Reality And Beyond

A complete theory of communication must be able to account for all the ways that people communicate, not just natural language. Why not song, dance. and painting? The history of HCI has a richness of variety of interaction that AI has not fully explored. For instance, the recent directions taken by virtual reality may offer yet more modes of communication: the sense of acceleration, the tactile and kinesthetic. (Virtual reality has its origins in the 1950s, when the first video-based flight simulator systems were developed for the military.)

The User: But What About my needs?

Most, if not all, of the interfaces and interactions described above rely on a mostly fixed user model. We may have given a misleading impression, as there have been efforts to design systems which, implicitly or explicitly, allow for differences between users.

As an early example, the Xerox Star and its conceptual progeny allow for some user-customization of the interface. This can have its limitations, however; on a Macintosh, once an interface has been modified by a user, other users wishing to use the same machine may be confronted with an interface that has been customized not to their liking.

Users are not created equal. For example, with regard to an expert system, it may be useful to model several levels of user expertise, or even a continuum from novice to expert [BEE1]. A novice generally requires more explanation and help. A user interface which treated all users as novices will almost certainly be a nuisance if not a hindrance for an expert [WEX1].

If we consider HCI as a form of dialogue, then user-modelling allows the computer to share a context with the user, in principle allowing for more effective communication. It is also worthwhile to study the mental models that users construct of the interfaces they interact with, as poor models can hinder familiarization and use of a system. These mental models can also be simulated on a computer. The results of this research can then be applied in AI as well as interface design principles in HCI.

3. Trends In User-Modelling

Attempts at implementing prototype systems in HCI can be used to determine the validatity and feasibility of various theories in AI. Thus, constructed HCI systems can be used as experimental data, as well as simulations.

One obvious and immediate area of application of AI techniques is in the creation of highly intelligent tools, both in software and hardware. This conclusion is merely the logical extrapolation of current efforts towards this goal.

A modest example of the former would be the automated assistance for the selection of a set of colors for a user-interface. This task might seem easy, but the selection of a moderate number of colors (say, around seven to eight) such that they do not create an ugly, loud, or otherwise unaesthetic combination takes a nontrivial amount of time. A tool could take into account the perceptual basis of color and offer advice based on knowledge that exists in the field of graphic arts.

A more ambitious project might be a tool capable of synthesizing graphic representations that best communicate a concept to the user. Such a tool would require a knowledge of how humans perceive shape.

An example of the latter might be an embedded system in a car, where additional sensors are mounted on the car so that the system can aid in driving depending on the skill and to some extend the needs of the driver. Such systems can be considered to be a logical extension of cruise control, automating the task of driving by reducing the workload for the driver.

The combination of user-modelling and plan recognition technology may, assuming technically feasibility, lead to a "do-what-I-mean-not-what-I-say" interface. In this regard, it may be of benefit to view user models as a specialized form of expert system. This may pose challenges to knowledge representation, as humans often do not hold internally consistent sets of beliefs and desires. It was beyond the scope of this paper to survey the state-of-the-art in knowledge representation theory.

Some members of the HCI community have proposed the concept of agents with limited autonomy, embedded into computer applications. These agents may, if given even limited capability for plan-recognition and user-modelling, assist the user with tasks. Agents can thus be considered a logical extension of the concept of highly intelligent tools.

As an example, a Usenet news agent might sift through large numbers of articles with the aim of selecting articles of possible interest to the user and then present the user with the results; the user would have to communicate her interests to the agent frequently in order for the agent to best make its selection. NewsPeek is just such a system, developed in the 1980s by a team at MIT including Alan Kay; at the time of writing, I was unable to find a specific reference on this work.

Some have proposed that these semi-autonomous agents could be invested with simulated personalities in hopes of encouraging better communication with human users. The rationale behind such proposals is no doubt based on the tendency for humans to anthropomorphize when communicating with non-human entities. Whether or not the addition of personality would actually aid in conversation and in task-completion is debatable. The results of such efforts may still be of interest to those studying user-modelling by computers and mental-modelling by humans.

If user-modelling technology proves to be viable, then it can be applied towards more complete simulation of users. It is interesting to compare the efforts of AI with Frederick Brooks' conception of intelligence amplification (IA) [RHE1]. "The objective is to build systems that amplify the human mind by providing it with computer-based auxiliaries that do things the mind has trouble doing." By this definition, computer programs such as spreadsheets and database programs comprise a primitive form of IA, as they take advantage of two strengths of computers over humans, namely the abilities of store vast amounts of information verbatim and to calculate the results of equations rapidly and repeatedly without significant error.

According to Brooks, IA is opposed in conception to AI. However, both AI and IA techniques can be combined in the field of HCI. Note that IA techniques can be directly applied to semi-autonomous agents discussed previously. IA may be of theoretical interest, as they point to significant underlying differences between human and machine cognition, which may be useful to take into account in user-modelling.

One area of research which is of more direct interest to AI is that of multimodal communication. Humans naturally communicate multimodally; though there may be redundancy, the information contains within these many modes can be used to enrich the information available to the computer in natural language processing. Many multimodal media we take for granted, but the way humans understand these media is often learned. "For literate societies it is not easy to grasp why nonliterate societies cannot see in three-dimensions or perspective. We assume this is normal vision and that no training is needed to view photos or films." [MCL1] Some of the perceptual tasks we assume are easy for humans may in fact rely on implicit, learned conventions. The study of how humans learn such conventions can be applied both to theory of communication in AI as well as HCI.

Backchannels are a consequence of the study of human communication strategies. The study of such strategies and how they are changed when faced with a restricted medium, such as the telephone or electronic mail, may be of benefit to both AI and HCI.

A more speculative possibility is the application of AI techniques of user-modelling and simulation to generate computer extensions of human personality. If one takes this to the logical extreme, one could conceivably simulate a human personality on a computer as an agent and thus extend the reach of human consciousness. Thus, the computer itself could facilitate a radically new medium of communication. "...man in the normal use of technology (or his variously extended body) is perpetually modified by it and in turn finds ways of modifying his technology." [MCL2] The many social consequences and ethical issues raised by such an invasive technology are beyond the scope of this paper.

To conclude, the field human-computer interaction has been enriched by the results of artificial intelligence research. At the same time, the results of HCI have usually been neglected by AI, possibly due to the great difficulties involved. However, both HCI and AI appear to gain much if there is greater cooperation and exchange of information between the fields, as the strengths of the two fields, respectively application and theory, are complementary.

Bibliography

ACK1
eds. D. Ackermann and M.J. Tauber. Mental Models and Human-Computer Interaction 1. North-Holland (Elsevier Science Publishers B.V.), New York, 1990.
AHL1
ed. David H. Ahl. The Best of Creative Computing, volume 1. Second edition. Creative Computing Press, Morris Plains, 1977.
BAR1
Philip Barker. Basic Principles of Human-Computer Interface Design. Hutchinson Education (Century Hutchinson Ltd.), London, 1989.
BEE1
ed. D. Beech. Command Language Directions. (Proceedings of the IFIP TC 2.7 Working Conference on Command Languages, 1979.) North-Holland Publishing Co., New York, 1980.
BOD1
Margaret A. Boden. Artificial Intelligence and Natural Man. The Harvester Press Ltd., Hassocks, 1977. (Also by Basic Books, Inc., New York.)
BOY1
Guy A. Boy. (Translated by Philippa H. Boy.) Intelligent Assistant Systems. Academic Press Inc., San Diego, 1991.
KOB1
eds. A. Kobsa, W. Wahlster. User Models in Dialog Systems. Springer-Verlag, New York, 1989.
LAU1
ed. Brenda Laurel. The Art of Human-Computer Interface Design. Addison-Wesley Publishing Company, Inc., Mew York, 1990.
MCL1
Marshall McLuhan. The Gutenberg Galaxy. The New American Library, Inc., New York, 1969.
MCL2
Marshall McLuhan. Understanding Media: The Extensions of Man. McGraw-Hill Book Company, New York, 1964.
MIL1
Richard K. Miller and Terri C. Walker. Natural Language and Voice Processing. The Fairmont Press, Inc., Lilburn, 1990.
RHE1
Howard Rheingold. Virtual Reality. Simon & Schuster, New York, 1991.
SEL1
ed. John Self. Artificial Intelligence and Human Learning: Intelligent Computer-Aided Instruction. Chapman and Hall Ltd., New York, 1988.
SHO1
E.H. Shortliffe, S.G. Axline, B.G. Buchanan, T.C Merigan, and N.S. Cohen. "An Artificial Intelligence Program to Advise Physicians Regarding Antimicrobial Therapy." Computers and Biomedical Research. 6, pp. 544-560, Academic Press, New York, 1973.
SMI1
D.C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. "Designing the Star User-Interface." BYTE: The Small Systems Journal. 7(4), pp. 242-82, Peterborough, BYTE Publishing Inc., 1982.
SUT1
Ivan E. Sutherland. "Sketchpad: A Man-Machine Graphical Communication System." MIT Lincoln Laboratory Technical Report no.296, Lexington, 1963.
TAN1
ed. Deborah Tannen. "Analyzing Discourse: Text and Talk." Georgetown University Press, Washington, 1981.
WEI1
Joseph Weizenbaum. "ELIZA: A Computer Program for the Study of Natural Language Communication between Man and Machine." Communications of the ACM. 9(1), pp. 36-45, The Association for Computing Machinery, Philadelphia, 1963.
WEX1
Richard L. Wexelblat. "On Interface Requirements for Expert Systems." AI Magazine. 10(3), pp. 66-78, The American Association for Artificial Intelligence, Menlo Park, 1989.