The Effects of Surveillance
on Task Performance
in Computer-Human Interaction
Unpublished Manuscript

 
 

Eva Jettmar

Dept. of Communication,

Stanford University, 1997

 

Abstract

This study explored the implications of designing computer software which monitors the user's behavior. Based on the theory that people's interaction with computers is guided by the same rules as human-human interaction, it was hypothesized that perceptions of being monitored by software would impair task performance and cause people to evaluate software less favorably.
The effects of emotional and intellectual monitoring were tested in a laboratory experiment. 21 subjects performed a computer-based intellectual task in one of four different monitoring conditions, and their scores for the task were recorded.  Both the scores and questionnaire responses were used for the analysis. Results showed  main effects on task performance for both kinds of surveillance in the predicted directions, but no effects on software evaluation. Thus, results are inconclusive, and more research will be needed in this new area.

1. INTRODUCTION  

Since the introduction of Personal Computers in the late 1970s, computers have become ubiquitous in Western societies. Computers are a prominent element of many typical workplaces, and they influence and change the daily lives of millions of users worldwide (Quarterman, 1989). As Nickerson stated  in 1988: "The effects that information technology has on our lives are beyond doubt very great".

However, with the event of the Macintosh, the creation of user-friendly applications  became an important issue in computer design, and early enthusiastic views of communication technology made way for criticism of applications which employ logic and languages which are unnatural and cannot be understood intuitively by humans. Such applications are not comprehensive towards potential needs and problems of users, and they frequently leave users feeling alienated (Maes, 1994). In their book "The Media Equation" (1996), Nass and Reeves present a body of research which indicates that people treat computers as if they were real people. This, in return, also means that people prefer to be treated by computers in ways that are fundamentally social.

However, social interaction between humans and computers requires communication between them which exceeds traditional human-computer interaction in the form of rigid pop-up alert windows on the part of the computer, and checking "ok" and "cancel" boxes on the part of the user. Specifically, social interaction between humans and computers requires a form of communication that allows a computer to gain sufficient knowledge of the user's states and traits to be able to assess the meaning of the user's actions.
 Following this principle, advancements in the field of Artificial Intelligence (AI) have led to the creation of software applications which "learn" about the user and use this knowledge to react to the user's actions in personalized and comprehensive ways, much like a human interactant would (see Selker, 1994).
 

2. INNOVATIVE SOFTWARE AND BEHAVIORAL MONITORING  

a. Intellectual Monitoring:

Agent-based software solutions and adaptive user-modelling  In particular, software which employs "agency" in that it either adaptively helps the user to accomplish a task (advisory function), or independently carries out tasks on behalf of the user (assistant fuction) heavily relies on user-modelling techniques in order to assess the needs, preferences, and capabilities of the user at any given time and for various tasks and situations, much like a human tutor or secretary would need to gain knowledge about a person in order to coach or assist them well (Minsky, 1994).

Computers can generate models of users in two different ways: Implicitly, by monitoring the user's behavior while accomplishing a task, or explicitly, through the use of "strategic probes" that ask the user to specify his/her preferences or give examples of how something should be done correctly. Many recent applications have used implicit monitoring strategies which are "adaptive", meaning that the user model is constantly updated as a result of continuous monitoring of the user's actions. Examples for software employing this principle are a learning personal appointment scheduling assistant (Mitchell et al, 1994), and an e-mail message sorting agent (Maes, 1994). Adaptive user-modelling is expected to be employed in a variety of consumer applications in the near future (Laurel, 1995), and it is currently employed in the software used for taking computer-based standardized tests, such as the GRE.

In addition to adaptive user modelling based on the user's intellectual behavior (intellectual monitoring), techniques are currently being developed which allow to unobtrusively monitor the user's emotional state by measuring changes in his/her galvanic skin response or body temperature changes by means of a "smart mouse" with built-in sensors (Reeves, 1997), or even assessing the user's emotions by evaluating vocal intonation or digitized images of the user's facial expressions while working with the application (Picard, 1997).
 

b.  Emotional Monitoring: Affective Computing  

Such data could be important indicators about the user's stress level and comfort with the interaction, and they could be used by the application to determine the pace and level of complexity at which data should be presented to the user, check whether the user is satisfied with the software's performance, as well as point out times when the software should offer the user help. Innovative applications of this kind "get to know" the user and react in comprehensive ways. Why is this important?
 Many people have more physical contact with computers than they do with living beings. (Picard, 1997). However, while even a puppy can sense a person's anger and  can subsequently correct its behavior, computers neither recognize a user's emotion, nor do anything because of it. "They don't notice whether you are thrilled, furious, or have fallen asleep. Very often, they don't even know whether or not you are there" (Picard, 1997).

However, in order to become friendly and intelligent companions, computers need to be able to recognize such emotions as interest, distress, and pleasure. Wearable computers, although in their infancy, could provide additional and unusual opportunities for a computer to get to know a user (Picard, 1997). Such computers coupled with sensors and pattern-recognition, as well as interpretive algorithms should soon be able to recognize basic affective states in a user. Coupled with comprehensive, friendly, and personalized feedback, which is based on the software's knowledge of the user, this is known as "affective computing". Affective computing represents a significant advancement in software design, as it could make software more user-friendly, as well as lead to human-computer interaction that is more satisfying to the user.  New applications which use innovative monitoring strategies tend to advertise this in order to emphasize how contemporary the software is, and because it is believed that users will trust adaptive high-tech applications more: Yet the question whether monitoring the user could also have adverse effects has never been asked.
 

3. THEORETICAL FRAMEWORK AND HYPOTHESES  

a. Social Facilitation  

Findings from the social sciences tell us that people act differently in the presence of other people than they do when they are alone (Sproull, Kiesler et al, 1996). When other people are present, most people try to behave in ways that are socially acceptable. Some perform tasks better because they want to "show off" in front of others, while others perform less well, because the presence of others intimidates them and causes them stress. Why is this important for the study of adaptive modelling and affective computing?

When a computer monitors how cleverly (or dumb) you act while solving a problem or whether you feel relaxed or distressed , it can be argued that this is similar to situations when other people observe you to determine how skilled you are at solving the problem, or to find out how you feel. It is reasonable to infer this because we know that people view computers much like they view other people, and they apply the same social rules to computers (Reeves and Nass, 1996). As  Nass, Steuer and Tauber (1994) put it, "Computers are social actors"; or, as Laurel (1993) put it, "Computers behave".

Thus, since being monitored by a computer could have similar effects as being watched by other people, it seems appropriate to look at the effects of human monitoring to get an idea of what the effects of software surveillance might be. A good place to look for findings about how the presence of others affects how people act is the body of research about "social facilitation".
 Social facilitation describes changes in people's behavior which are caused by the presence of others. These changes are often subconscious and can affect people's performance on tasks (Graydon, 1995). It has even been suggested that this might be true when people's work is monitored by a computer instead of by other people (Aiello, 1993).
 

b. Hypotheses  

While some people might perform better in the presence of others (such as sports heroes), most people perform less well under surveillance conditions, because being watched usually causes high levels of physiological arousal. This arousal, in return, is known to impair performance and to cause unpleasant feelings of distress (Andersen, 1995).  Therefore,

H1: People will perform less well on a computer-based task under conditions of   emotional or intellectual surveillance by the computer than under conditions   of no surveillance, and they will perform least well when both emotional and   intellectual surveillance are simultaneously present.

 Research from interpersonal communication shows that people perceive others as a "Gestalt", rather than perceiving single bits of information about people (Andersen, 1995). For example, if a person is standing closer to us, we also think this person smiles more, even if in fact this is not true. We think so because we perceive a "Gestalt" of closeness, and because in our minds, smiling is a part of "closeness", we ascribe it to the person, even if we have no other reason for doing so.


For the opinion a human holds about a computer, this could mean that the unpleasant feelings of distress which computer surveillance causes could be sufficient for people to automatically infer other negative traits about the same computer, such as ascribing lower performance to it, or simply liking it less. Therefore,

H2: People will ascribe more ease of use to a computer under no surveillance than   under emotional or intellectual surveillance by the computer , and they will   perceive the least ease of use under both emotional and intellectual     surveillance.

Because a friendly, "fun", environment keeps people more motivated to perform a task well (Nickerson, 1988), and surveillance is assumed to decrease fun by causing stress, it is also hypothesized that people will try less hard to perform well under monitoring conditions.

H3: People will be more motivated to perform well under no surveillance than   under emotional or intellectual surveillance by the computer, and they will   be least motivated under both emotional and intellectual surveillance.

Similarly, we think that people might be less satisfied with their interaction with the computer if they know theyare being monitored by the computer. Therefore,

H4: People will be more satisfied with the interaction under no surveillance than   under emotional or intellectual surveillance by the computer, and they will   be least satisfied under both emotional and intellectual surveillance.

As mentioned above, it is also conceivable that  people will rate the performance of the software less favorably if the interaction was stressful due to the computer's monitoring. Therefore,

H5: People will ascribe higher performance to software under no surveillance    than under emotional or intellectual surveillance by the computer, and they   will rank performance lowest under both emotional and intellectual    surveillance.

Similarly, the software might be considered to be less likeable, less helpful, and less reliable under conditions of surveillance; this is stated in hypotheses 6, 7, and 8.
 

H6: People will rate software as more likeable under no surveillance than    under emotional or intellectual surveillance by the computer, and they will   rate the software as  least likeable under both emotional and intellectual    surveillance.

H7: People will rate software as more helpful under no surveillance than    under emotional or intellectual surveillance by the computer, and they will   rate the software as  least likeable under both emotional and intellectual    surveillance.

H8: People will rate software as more reliable under no surveillance than under emotional or intellectual surveillance by the computer, and they will   rate the software as  least likeable under both emotional and intellectual surveillance. Because some research on social facilitation suggests that people with high self-esteem or with a lot of experience with the task might actually perform better under surveillance (Martin, 1985), two research questions were formulated:

 RQ1: Does software surveillance influence people with high self-esteem differently   than  people with low self-esteem?

 RQ2: Does software surveillance influence people who are experienced with a task   differently than task novices?

Finally, we were interested in whether females might be influenced differently than males.

 RQ3: Does software surveillance influence females differently than males?

4. METHOD  

a. Overview  

The effects of emotional and intellectual monitoring were tested in a laboratory experiment. 21 subjects performed a computer-based intellectual task in one of four different monitoring conditions, and their scores for the task were recorded. Subsequently, each subject filled out a questionnaire about their perceptions of the software and the interaction with the computer. Both the scores and the questionnaire responses were used for the analysis.
 

b. Subjects  

21 undergraduate students from a large US West Coast university participated in the study as part of a requirement for an undergraduate course in Communication. Gender distribution was roughly equal among subjects. Most subjects stated that they enjoyed working with computers, but nobody had experience in programming with Hypercard, the application which was used to create the software used for the manipulation and the task. Therefore, it is unlikely that subjects would have had knowledge of the manipulation.
   

c. Apparatus and Stimulus

Manipulation  

Four versions of an interactive computer-based quiz with questions similar to questions on the Graduate Record Exam (GRE) were programmed in Hypercard. Each version represented one of four conditions: No surveillance, emotional surveillance, intellectual surveillance, and both emotional and intellectual surveillance. The four versions differed in two ways: First, the on-screen instructions differed in that in the non-surveillance (NS) condition, subjects were simply instructed to answer the questions, while in the emotional surveillance (ES) condition, subjects were told that their clicking the "next" button to get to the next question would make the computer monitor their stress level and report it to the software, which would in return adjust the difficulty for the following questions. Subjects in the "intellectual surveillance" (IS) condition were told that the software was "intelligent" and would assess their task performance continously in order to adjust the difficulty. Subjects in the "emotional and intellectual surveillance" (IES)condition were told that their stress level and their performance would be monitored by the software.
 

Second, after each question, a typical Macintosh-window would appear on the screen. In the non-surveillance condition, the message in the window was "Loading next question", with an "ok" button which the user had to click. In the ES-condition, the message was "Stress monitoring completed", while in the IS-condition, the message was "Performance monitoring completed", and in the IES-condition, it was "Stress and Performance monitoring completed". The user had to click an "ok" button in  the window to get to the next question. It must be noted that these differences in the presentation of the software represented the only difference between conditions, i.e. the software did not actually monitor the user's performance or stress level.  In order to make the ES and IES conditions more convincing, subjects were connected to Galvanic Skin Response sensors for the alleged "stress monitoring".
 

d. Procedure  

Subjects were scheduled to come to a small laboratory and were seated in front of a Macintosh computer which displayed the GRE-application. Subjects in the ES and IES conditions were also connected to small GRS sensors in order to add credibility to the "stress monitoring". Assignment to conditions was random, but balanced for gender. Subjects were then asked to read the on-screen instructions and perform the GRE task, which contained 15 questions for which they were given 15 minutes. The experimenter then left the room and returned upon expiration of the 15 minute time-frame. Each subject was subsequently asked to complete a questionnaire. Upon completion of the questionnaire, subjects were thanked for their participation and dismissed.
 

e. Measure  

In addition to recording the scores on the GRE task for each subject, a questionnaire was used as a measurement intrument. It contained  questions that measured sets of constructs that signal reactions towards the ease of use, performance, likeability, helpfulness, and reliability of the GRE software and the satisfaction with the interaction with the computer, as well as items measuring motivation, computer experience, and self confidence.
 These constructs were measured using Likert scales. The instrument presented the subjects with terms describing the software (such as "friendly"), which the subjects were asked to rate on an 8-point Likert-type scale from "describes very poorly" to "describes very well".
 In addition, subjects were presented with statements (such as "the computer acted like a partner"), which they were asked to rate according to their level of disagreement or agreement on an 8-point Likert scale.
 

f.  Statistical Analysis  

The questionnaire items measuring the quality of the interaction with the computer were collapsed into a new variable named "interaction satisfaction" (Cronbach's Alpha = .93). Similarly, items measuring  software performance, (Cronbach's Alpha = .81) and software likeability were recoded likewise (Cronbach's Alpha = .83), as were items measuring how helpful the application was perceived to be (Cronbach's Alpha = .83). Finally, a new variable was created for perceived reliability (Cronbach's Alpha = .67).
 In addition, items measuring self confidence (Cronbach's Alpha = .71), and confidence for taking the GRE (Cronbach's Alpha = .65) were collapsed. All the new variables were confirmed by factor analyses. Subsequently, for both self-confidence and confidence about taking the GRE, the sample was split at the means and recoded into new variables ("high/low self-confidence" and "high/low GRE confidence").
 Full 2x2 factorial ANOVAs were used  to test the eight hypotheses regarding the effects of monitoring on task performance, perceived ease of use, motivation, interaction satisfaction, as well as perceived  performance, likeability, helpfulness, and reliability of the software. The two types of monitoring (emotional and intellectual) served as independent categories. The .05 probability level for alpha was adopted for all statistical tests.
 

To answer the research questions, ANCOVAs were performed to identify the effects of the two types of monitoring when self-confidence and confidence with the GRE served as covariates . Finally,  posthoc t-tests were performed to interpret the findings in the ANOVAs.
 

5. RESULTS  

Hypothesis 1, predicting that people  perform less well on a computer-based task under conditions of emotional or intellectual surveillance than with no surveillance, and least well when both emotional and intellectual surveillance are present,  was confirmed. The full factorial design revealed main effects for both emotional surveillance F (1, 41) = 6.0, p < .02, such that ES subjects had lower GRE scores than non-ES subjects, M = 9.45 vs. M = 11.10, and for intellectual surveillance  F (1, 41) = 4.6, p < .04, such that IS subjects had lower GRE scores than non-IS subjects M = 9.54 vs. M = 11.00.

Hypothesis 2, stating that people  under conditions of emotional or intellectual surveillance find the software less easy to use than with no surveillance, and least easy when both emotional and intellectual surveillance are present, was not confirmed.
 

The 2x2 factorial design indicated a main effect for intellectual surveillance  F (1, 41) = 4.3, p < .05, but in the direction that IS subjects ranked software as easier to use  than non-IS subjects M = 7.36 vs. M = 6.40.
 

The 2x2 factorial design used to test hypothesis 3, stating that people  under emotional or intellectual surveillance would be less motivated than people under no surveillance, and least motivated when both emotional and intellectual surveillance are present, yielded no main effects. However, an interaction effect between the two types of surveillance was found F (1, 41) = 5.4, p < .02. Therefore, hypothesis 3 was partially confirmed.
 

Hypothesis 4, regarding subjects' satisfaction with the interaction, was not confirmed.
The 2x2 factorial design yielded no significant effects of the two types of monitoring. Hypothesis 5 was not confirmed. Subjects' ratings of the software's performance did not significantly vary as a function of  monitoring conditions.
 

Regarding likability, a main effect for intellectual surveillance was found in the ANOVA F (1, 41) = 7.6, p < .01, but in the direction that IS subjects rated the software as more likeable  than non-IS subjects M = 4.89 vs. M = 3.78. Therefore, hypothesis 6 was not confirmed. Contrary to our predictions, people who were exposed to intellectual surveillance felt that the software they used was not only easier to use, but also more likeable! Thus, users may indeed feel more comfortable when they believe software to be "intelligent". Likewise, hypothesis 7, regarding the software's perceived helpfulness, was not confirmed. No effects were found in the factorial design. Hypothesis 8, regarding the software's reliability, was also not confirmed. Subjects' perceptions of the software's reliability did not vary significantly as an effect of emotional and intellectual surveillance, although there was a tendency for emotional surveillance in the predicted direction.
 Regarding the research questions about gender, self confidence, and confidence with the GRE, no significant effects were found, although tendencies in the predicted directions could be noted. Further analyses are required to yield conclusive results.
 

6. DISCUSSION  

The results of this study are inconclusive. Support was provided for hypothesis 1: People perform a computer-based task less well when they are monitored emotionally or intellectually, and performance is even worse when both surveillance modes are simultaneously employed. This finding corresponds exactly with the predictions, and can be explained by the heightened arousal people experience when they feel they are being watched. It also adds credibility to the theory that people perceive computers in ways that are very similar to the ways people perceive other people.
 

The remaining findings are highly inconclusive. Results for hypothesis three suggest that there is an interaction effect between ES and IS for user motivation. Motivation was highest for subjects who experienced no monitoring at all. However, when intellectual monitoring was present, motivation was actually higher for subjects who were also exposed to emotional monitoring, and lower for subjects who experienced intellectual monitoring alone (see graph).
 

a. Implications for Software Design  

What this study tells us about software design is problematic: While telling users that they will be monitored can actually impair their performance, which is clearly an undesirable effect, this can on the other hand cause users to find the same software more likeable, and easier to use - clearly a desirable effect. Therefore, caution about proudly advertising new software's adaptive nature seems to be warranted, and this study does not answer the debate between explicit and implicit user modelling.
 

A solution to this dispute might be software which initially (before the actual task starts) employs explicit user modelling (e.g., asking users about their preferences and needs), thereby convincing the user of the software''s "intelligence". Once the actual task begins, the software could still implicitly monitor the user, but without the user's knowledge, so that adverse surveillance effects are mitigated.
 

b. Suggestions for future research  This study - the first one of its kind - posted more questions than it could answer, and much research will be needed to address even the most urgent ones. The use of standardized scales for self confidence might shed more light on the effects of user personality in social facilitation. Monitoring of the user's stress level might give good indications about the exact times when the user actually experiences problems.
 

This new area of research about adaptive software is an important one, as its findings may shape parts of the professional and private lives of millions of users around the world. It remains to be hoped that comprehensive research will ultimately lead to software applications which are almost as likeable as the much-famed puppy.

References

Aiello, J., et al. (1993). Computer Monitoring of work performance: Extending the social facilitation framework to electronic presence. Journal of Applied Psychology, vol. 23 (7), p. 537 - 548.
Andersen, P.A. (1995). Arousal and anxiety.  In: Beside Language: Nonverbal communication in interpersonal interaction. s.l.
Graydon, J., et al. (1995). The effects of personality on social facilitation whilst performing a sports-re
Laurel, B. (1993). The Nature of the Beast. In: Brenda Laurel, Computers as theater, p 1-33.
Laurel, B. (1995, May). Interface Agents. Presentation at the Vanguard Group Conference, Marina Del Rey, CA.
Maes, P. (1994). Agents that Reduce Work and Information Overload. Communications of the ACM, v. 37, 7, p 31 - 40.
Martin, S. et al. (1985). Social Facilitation effects resulting from locus of control using human and computer experimenters. Computers in Human Behavior, vol. 1(2), p. 123 - 130.
Minsky, M. (1994). A Conversation with Marvin Minsky about Agents. Communications of the ACM, v. 37, 7, p. 23 - 29.
Mitchell, T., et al (1994). Experience with a learning personal assistant. Communications of the ACM, v. 37, 7, p. 23 - 29.
Nass, C., Steuer, J., & Tauber, E (1994, April). Computers are Social Actors. Paper presented at the CHI '94 conference of the ACM/SIGCHI, Boston, MA.
Nickerson, R.S., &  Zodhiates, P.P.  (1988).  Technology in education: Looking towards 2020. Hillsdale, NJ: Lawrence Erlbaum Associates. Picard, R. (1997).  Does HAL cry digital tears: Emotions and Computers. In D.G. Stork (ed.) HAL's Legacy: 2001's Computer as Dream and Reality. MIT Press: Cambridge, MA.
Quarterman, J.S.  (1989).  The Matrix: Computer networks and conferencing systems worldwide. Cambridge, MA: Digital Equipment Corporation.
Reeves, B. (1997, May). The role of monitoring emotions in affective computing. Class lecture, Stanford University, CA.
Reeves, B. and Nass, C. (1996). The Media Equation. How People treat Computers, Television, and New Media like real People and Places. New York, NY: Cambridge University Press.
Selker, T. (1994). COACH: A Teaching Agent that learns. Communications of the ACM, v. 37, 7, p. 2 - 12).
Sproull, L.,  Kiesler, S., et al. (1996). When the Interface is a Face. Human-Computer Interaction, vol. 11, p. 97 - 124.

D.I.E. 2004