Dept. of Communication,
Stanford University, 1997
Abstract This
study explored the implications of designing computer software which
monitors the user's behavior. Based on the theory that people's interaction
with computers is guided by the same rules as human-human interaction,
it was hypothesized that perceptions of being monitored by software
would impair task performance and cause people to evaluate software
less favorably.
The effects of emotional and intellectual monitoring were tested in a laboratory experiment. 21 subjects performed a computer-based intellectual task in one of four different monitoring conditions, and their scores for the task were recorded. Both the scores and questionnaire responses were used for the analysis. Results showed main effects on task performance for both kinds of surveillance in the predicted directions, but no effects on software evaluation. Thus, results are inconclusive, and more research will be needed in this new area.
1. INTRODUCTION Since the introduction of Personal Computers in the late 1970s, computers have become ubiquitous in Western societies. Computers are a prominent element of many typical workplaces, and they influence and change the daily lives of millions of users worldwide (Quarterman, 1989). As Nickerson stated in 1988: "The effects that information technology has on our lives are beyond doubt very great". However, with the event of the Macintosh, the creation of user-friendly applications became an important issue in computer design, and early enthusiastic views of communication technology made way for criticism of applications which employ logic and languages which are unnatural and cannot be understood intuitively by humans. Such applications are not comprehensive towards potential needs and problems of users, and they frequently leave users feeling alienated (Maes, 1994). In their book "The Media Equation" (1996), Nass and Reeves present a body of research which indicates that people treat computers as if they were real people. This, in return, also means that people prefer to be treated by computers in ways that are fundamentally social. However,
social interaction between humans and computers requires communication
between them which exceeds traditional human-computer interaction in
the form of rigid pop-up alert windows on the part of the computer,
and checking "ok" and "cancel" boxes on the part of the user. Specifically,
social interaction between humans and computers requires a form of communication
that allows a computer to gain sufficient knowledge of the user's states
and traits to be able to assess the meaning of the user's actions.
2. INNOVATIVE SOFTWARE AND BEHAVIORAL MONITORING a. Intellectual Monitoring: Agent-based software solutions and adaptive user-modelling In particular, software which employs "agency" in that it either adaptively helps the user to accomplish a task (advisory function), or independently carries out tasks on behalf of the user (assistant fuction) heavily relies on user-modelling techniques in order to assess the needs, preferences, and capabilities of the user at any given time and for various tasks and situations, much like a human tutor or secretary would need to gain knowledge about a person in order to coach or assist them well (Minsky, 1994). Computers can generate models of users in two different ways: Implicitly, by monitoring the user's behavior while accomplishing a task, or explicitly, through the use of "strategic probes" that ask the user to specify his/her preferences or give examples of how something should be done correctly. Many recent applications have used implicit monitoring strategies which are "adaptive", meaning that the user model is constantly updated as a result of continuous monitoring of the user's actions. Examples for software employing this principle are a learning personal appointment scheduling assistant (Mitchell et al, 1994), and an e-mail message sorting agent (Maes, 1994). Adaptive user-modelling is expected to be employed in a variety of consumer applications in the near future (Laurel, 1995), and it is currently employed in the software used for taking computer-based standardized tests, such as the GRE. In addition
to adaptive user modelling based on the user's intellectual behavior
(intellectual monitoring), techniques are currently being developed
which allow to unobtrusively monitor the user's emotional state by measuring
changes in his/her galvanic skin response or body temperature changes
by means of a "smart mouse" with built-in sensors (Reeves, 1997), or
even assessing the user's emotions by evaluating vocal intonation or
digitized images of the user's facial expressions while working with
the application (Picard, 1997). b. Emotional Monitoring: Affective Computing Such data
could be important indicators about the user's stress level and comfort
with the interaction, and they could be used by the application to determine
the pace and level of complexity at which data should be presented to
the user, check whether the user is satisfied with the software's performance,
as well as point out times when the software should offer the user help.
Innovative applications of this kind "get to know" the user and react
in comprehensive ways. Why is this important? However,
in order to become friendly and intelligent companions, computers need
to be able to recognize such emotions as interest, distress, and pleasure.
Wearable computers, although in their infancy, could provide additional
and unusual opportunities for a computer to get to know a user (Picard,
1997). Such computers coupled with sensors and pattern-recognition,
as well as interpretive algorithms should soon be able to recognize
basic affective states in a user. Coupled with comprehensive, friendly,
and personalized feedback, which is based on the software's knowledge
of the user, this is known as "affective computing". Affective computing
represents a significant advancement in software design, as it could
make software more user-friendly, as well as lead to human-computer
interaction that is more satisfying to the user. New applications
which use innovative monitoring strategies tend to advertise this in
order to emphasize how contemporary the software is, and because it
is believed that users will trust adaptive high-tech applications more:
Yet the question whether monitoring the user could also have adverse
effects has never been asked.
3. THEORETICAL FRAMEWORK AND HYPOTHESES a. Social Facilitation Findings from the social sciences tell us that people act differently in the presence of other people than they do when they are alone (Sproull, Kiesler et al, 1996). When other people are present, most people try to behave in ways that are socially acceptable. Some perform tasks better because they want to "show off" in front of others, while others perform less well, because the presence of others intimidates them and causes them stress. Why is this important for the study of adaptive modelling and affective computing? When a computer monitors how cleverly (or dumb) you act while solving a problem or whether you feel relaxed or distressed , it can be argued that this is similar to situations when other people observe you to determine how skilled you are at solving the problem, or to find out how you feel. It is reasonable to infer this because we know that people view computers much like they view other people, and they apply the same social rules to computers (Reeves and Nass, 1996). As Nass, Steuer and Tauber (1994) put it, "Computers are social actors"; or, as Laurel (1993) put it, "Computers behave". Thus, since
being monitored by a computer could have similar effects as being watched
by other people, it seems appropriate to look at the effects of human
monitoring to get an idea of what the effects of software surveillance
might be. A good place to look for findings about how the presence of
others affects how people act is the body of research about "social
facilitation". b. Hypotheses While some people might perform better in the presence of others (such as sports heroes), most people perform less well under surveillance conditions, because being watched usually causes high levels of physiological arousal. This arousal, in return, is known to impair performance and to cause unpleasant feelings of distress (Andersen, 1995). Therefore, H1: People will perform less well on a computer-based task under conditions of emotional or intellectual surveillance by the computer than under conditions of no surveillance, and they will perform least well when both emotional and intellectual surveillance are simultaneously present. Research from interpersonal communication shows that people perceive others as a "Gestalt", rather than perceiving single bits of information about people (Andersen, 1995). For example, if a person is standing closer to us, we also think this person smiles more, even if in fact this is not true. We think so because we perceive a "Gestalt" of closeness, and because in our minds, smiling is a part of "closeness", we ascribe it to the person, even if we have no other reason for doing so.
H2: People will ascribe more ease of use to a computer under no surveillance than under emotional or intellectual surveillance by the computer , and they will perceive the least ease of use under both emotional and intellectual surveillance. Because a friendly, "fun", environment keeps people more motivated to perform a task well (Nickerson, 1988), and surveillance is assumed to decrease fun by causing stress, it is also hypothesized that people will try less hard to perform well under monitoring conditions. H3: People will be more motivated to perform well under no surveillance than under emotional or intellectual surveillance by the computer, and they will be least motivated under both emotional and intellectual surveillance. Similarly, we think that people might be less satisfied with their interaction with the computer if they know theyare being monitored by the computer. Therefore, H4: People will be more satisfied with the interaction under no surveillance than under emotional or intellectual surveillance by the computer, and they will be least satisfied under both emotional and intellectual surveillance. As mentioned above, it is also conceivable that people will rate the performance of the software less favorably if the interaction was stressful due to the computer's monitoring. Therefore, H5: People will ascribe higher performance to software under no surveillance than under emotional or intellectual surveillance by the computer, and they will rank performance lowest under both emotional and intellectual surveillance. Similarly,
the software might be considered to be less likeable, less helpful,
and less reliable under conditions of surveillance; this is stated in
hypotheses 6, 7, and 8. H6: People will rate software as more likeable under no surveillance than under emotional or intellectual surveillance by the computer, and they will rate the software as least likeable under both emotional and intellectual surveillance. H7: People will rate software as more helpful under no surveillance than under emotional or intellectual surveillance by the computer, and they will rate the software as least likeable under both emotional and intellectual surveillance. H8: People will rate software as more reliable under no surveillance than under emotional or intellectual surveillance by the computer, and they will rate the software as least likeable under both emotional and intellectual surveillance. Because some research on social facilitation suggests that people with high self-esteem or with a lot of experience with the task might actually perform better under surveillance (Martin, 1985), two research questions were formulated: RQ1: Does software surveillance influence people with high self-esteem differently than people with low self-esteem? RQ2: Does software surveillance influence people who are experienced with a task differently than task novices? Finally, we were interested in whether females might be influenced differently than males. RQ3: Does software surveillance influence females differently than males?
4. METHOD a. Overview The effects
of emotional and intellectual monitoring were tested in a laboratory experiment.
21 subjects performed a computer-based intellectual task in one of four
different monitoring conditions, and their scores for the task were recorded.
Subsequently, each subject filled out a questionnaire about their perceptions
of the software and the interaction with the computer. Both the scores
and the questionnaire responses were used for the analysis. b. Subjects 21 undergraduate
students from a large US West Coast university participated in the study
as part of a requirement for an undergraduate course in Communication.
Gender distribution was roughly equal among subjects. Most subjects stated
that they enjoyed working with computers, but nobody had experience in
programming with Hypercard, the application which was used to create the
software used for the manipulation and the task. Therefore, it is unlikely
that subjects would have had knowledge of the manipulation. c. Apparatus and Stimulus Manipulation Four versions
of an interactive computer-based quiz with questions similar to questions
on the Graduate Record Exam (GRE) were programmed in Hypercard. Each version
represented one of four conditions: No surveillance, emotional surveillance,
intellectual surveillance, and both emotional and intellectual surveillance.
The four versions differed in two ways: First, the on-screen instructions
differed in that in the non-surveillance (NS) condition, subjects were
simply instructed to answer the questions, while in the emotional surveillance
(ES) condition, subjects were told that their clicking the "next" button
to get to the next question would make the computer monitor their stress
level and report it to the software, which would in return adjust the
difficulty for the following questions. Subjects in the "intellectual
surveillance" (IS) condition were told that the software was "intelligent"
and would assess their task performance continously in order to adjust
the difficulty. Subjects in the "emotional and intellectual surveillance" (IES)condition were told that their stress level and their performance
would be monitored by the software. Second, after each question, a typical Macintosh-window would appear
on the screen. In the non-surveillance condition, the message in the window
was "Loading next question", with an "ok" button which the user had to
click. In the ES-condition, the message was "Stress monitoring completed",
while in the IS-condition, the message was "Performance monitoring completed",
and in the IES-condition, it was "Stress and Performance monitoring completed".
The user had to click an "ok" button in the window to get to the
next question. It must be noted that these differences in the presentation
of the software represented the only difference between conditions, i.e.
the software did not actually monitor the user's performance or stress
level. In order to make the ES and IES conditions more convincing,
subjects were connected to Galvanic Skin Response sensors for the alleged
"stress monitoring". d. Procedure Subjects
were scheduled to come to a small laboratory and were seated in front
of a Macintosh computer which displayed the GRE-application. Subjects
in the ES and IES conditions were also connected to small GRS sensors
in order to add credibility to the "stress monitoring". Assignment to
conditions was random, but balanced for gender. Subjects were then asked
to read the on-screen instructions and perform the GRE task, which contained
15 questions for which they were given 15 minutes. The experimenter then
left the room and returned upon expiration of the 15 minute time-frame.
Each subject was subsequently asked to complete a questionnaire. Upon
completion of the questionnaire, subjects were thanked for their participation
and dismissed. e. Measure In addition
to recording the scores on the GRE task for each subject, a questionnaire
was used as a measurement intrument. It contained questions that
measured sets of constructs that signal reactions towards the ease of
use, performance, likeability, helpfulness, and reliability of the GRE
software and the satisfaction with the interaction with the computer,
as well as items measuring motivation, computer experience, and self confidence.
f. Statistical Analysis The questionnaire
items measuring the quality of the interaction with the computer were
collapsed into a new variable named "interaction satisfaction" (Cronbach's
Alpha = .93). Similarly, items measuring software performance, (Cronbach's
Alpha = .81) and software likeability were recoded likewise (Cronbach's
Alpha = .83), as were items measuring how helpful the application was
perceived to be (Cronbach's Alpha = .83). Finally, a new variable was
created for perceived reliability (Cronbach's Alpha = .67). To answer the research questions, ANCOVAs were performed to identify
the effects of the two types of monitoring when self-confidence and confidence
with the GRE served as covariates . Finally, posthoc t-tests were
performed to interpret the findings in the ANOVAs.
5. RESULTS Hypothesis
1, predicting that people perform less well on a computer-based
task under conditions of emotional or intellectual surveillance than with
no surveillance, and least well when both emotional and intellectual surveillance
are present, was confirmed. The full factorial design revealed main
effects for both emotional surveillance F (1, 41) = 6.0, p < .02, such
that ES subjects had lower GRE scores than non-ES subjects, M = 9.45 vs.
M = 11.10, and for intellectual surveillance F (1, 41) = 4.6, p
< .04, such that IS subjects had lower GRE scores than non-IS subjects
M = 9.54 vs. M = 11.00. Hypothesis 2, stating that people under conditions of emotional
or intellectual surveillance find the software less easy to use than with
no surveillance, and least easy when both emotional and intellectual surveillance
are present, was not confirmed. The 2x2 factorial design indicated a main effect for intellectual
surveillance F (1, 41) = 4.3, p < .05, but in the direction that
IS subjects ranked software as easier to use than non-IS subjects
M = 7.36 vs. M = 6.40. The 2x2 factorial design used to test hypothesis 3, stating that
people under emotional or intellectual surveillance would be less
motivated than people under no surveillance, and least motivated when
both emotional and intellectual surveillance are present, yielded no main
effects. However, an interaction effect between the two types of surveillance
was found F (1, 41) = 5.4, p < .02. Therefore, hypothesis 3 was partially
confirmed. Hypothesis 4, regarding subjects' satisfaction with the interaction,
was not confirmed. Regarding likability, a main effect for intellectual surveillance
was found in the ANOVA F (1, 41) = 7.6, p < .01, but in the direction
that IS subjects rated the software as more likeable than non-IS
subjects M = 4.89 vs. M = 3.78. Therefore, hypothesis 6 was not confirmed.
Contrary to our predictions, people who were exposed to intellectual surveillance
felt that the software they used was not only easier to use, but also
more likeable! Thus, users may indeed feel more comfortable when they
believe software to be "intelligent". Likewise, hypothesis 7, regarding the software's perceived helpfulness,
was not confirmed. No effects were found in the factorial design. Hypothesis
8, regarding the software's reliability, was also not confirmed. Subjects'
perceptions of the software's reliability did not vary significantly as
an effect of emotional and intellectual surveillance, although there was
a tendency for emotional surveillance in the predicted direction.
6. DISCUSSION The results
of this study are inconclusive. Support was provided for hypothesis 1:
People perform a computer-based task less well when they are monitored
emotionally or intellectually, and performance is even worse when both
surveillance modes are simultaneously employed. This finding corresponds
exactly with the predictions, and can be explained by the heightened arousal
people experience when they feel they are being watched. It also adds
credibility to the theory that people perceive computers in ways that
are very similar to the ways people perceive other people. The remaining findings are highly inconclusive. Results for hypothesis
three suggest that there is an interaction effect between ES and IS for
user motivation. Motivation was highest for subjects who experienced no
monitoring at all. However, when intellectual monitoring was present,
motivation was actually higher for subjects who were also exposed to emotional
monitoring, and lower for subjects who experienced intellectual monitoring
alone (see graph). a. Implications for Software Design What this
study tells us about software design is problematic: While telling users
that they will be monitored can actually impair their performance, which
is clearly an undesirable effect, this can on the other hand cause users
to find the same software more likeable, and easier to use - clearly a
desirable effect. Therefore, caution about proudly advertising new software's
adaptive nature seems to be warranted, and this study does not answer
the debate between explicit and implicit user modelling. A solution to this dispute might be software which initially (before
the actual task starts) employs explicit user modelling (e.g., asking
users about their preferences and needs), thereby convincing the user
of the software''s "intelligence". Once the actual task begins, the software
could still implicitly monitor the user, but without the user's knowledge,
so that adverse surveillance effects are mitigated. b. Suggestions
for future research This study
- the first one of its kind - posted more questions than it could answer,
and much research will be needed to address even the most urgent ones.
The use of standardized scales for self confidence might shed more light
on the effects of user personality in social facilitation. Monitoring
of the user's stress level might give good indications about the exact
times when the user actually experiences problems. This new area of research about adaptive software is an important one, as its findings may shape parts of the professional and private lives of millions of users around the world. It remains to be hoped that comprehensive research will ultimately lead to software applications which are almost as likeable as the much-famed puppy.
References Aiello, J.,
et al. (1993). Computer Monitoring of work performance: Extending the
social facilitation framework to electronic presence. Journal of Applied
Psychology, vol. 23 (7), p. 537 - 548. |