Interviews and Data Recording

Five Key Issues

Settings Goals. Goals will influence the nature of data gathering sessions, the data gathering techniques to be used, and the analysis to be performed.
Identifying Participants. Those who fit the profile of types of people from whom data can be gathered are called the study population. Types of sampling are as follows:

Probability Sampling: simple random sampling or stratified sampling
Nonprobability Sampling: convenience sampling or volunteer panels
- Convenience Sampling: sample includes those who were available rather than those specifically selected

Relationship with Participants: informed consent with a clear and professional relationship between participant and researcher (however, informed consent is generally not required when gathering requirements data for commercial company where a contract usually exist between collector and provider)

Triangulation: the investigation of a phenomenon from at least two different perspectives. This is mostly focused on verification and reliability of data rather than making up for the limitations of another type of methodology

Triangulation of data: data is drawn from different sources at different times/places/people
Investigator triangulation: different researchers (observers, interviewers, and so on) have been involved in collecting and interpreting the data
Triangulation of theories: use of different theoretical frameworks through which to view data
Methodological triangulation: employ different data gathering techniques

Interviews

A conversation with a purpose

Good for exploring issues, learning more about tasks, and getting inside user’s head.

Open-ended/unstructured → exploratory and similar to conversation. Can be time-consuming but can also produce rich insights
Semi-structured → plans basic script with both closed and open questions but probes interviewee until no new relevants info is there
Structured → predetermined questions like a questionnaire, study is standardized (same questions with each participant)
Group interviews → 3-10 people selected to provide a representative sample of the target population. Useful for investigating shared issues rather than individual experiences

Planning

When developing Interview Questions, keep in mind open questions are best suited where the goal of the session is exploratory; closed questions are best suited where the possible answers are known in advance. Break long or compound questions into separate questions

A lot of decisions to make:

Choosing a framework
Level of participation to adopt
How to make a record of the data
How to gain acceptance into the group being studied
How to ensure that the study uses difference perspectives

Ethnography: the description of the customs of people and cultures. A distinguishing feature of ethnographic studies compared with other data gathering is that a situation is observed without imposing any a priori structure or framework upon it, and everything is viewed as “strange”.

Technique	Good for	Kind of Data	Advantages	Disadvantages
Interviews	Exploring issues	Mostly qualitative (some quantitative)	Interviewer can guide, encourages contact between researchers and users	Artificial environment might be intimidating, remove them from usual environment
Focus Groups	Collecting multiple viewpoints	Mostly qualitative (some quantitative)	Highlight areas of agreement/conflict, encourages contact between researchers and users	Possibility of dominant characters
Questionnaires	Answering specific questions	Quantitative and Qualitative	Can reach many people with low resource requirements	Design is key, response rates may be low
Direct observation in the field	Understanding context of user activity	Mostly qualitative	Observational insights	Very time-consuming, huge amounts of data
Direct observation in a controlled environment	Captural detail of individuals	Quantitative and qualitative	User can focus on task without interruption	Data may be of limited use due to artificial environment
Indirect observation	Observing users in natural environment without distraction	Quantitative (logging) and qualitative (diary)	Can be long due to automative recording	Large amounts of data implies need for tools to support analysis, participants may exaggerate memories

Running the interview

Before starting, make sure that the goals of the interview have been explained to the interviewee and that they are willing to proceed. Listen more than talk, repond with sympathy but without bias, and to appear to enjoy the interview.

Intro
- interviewer introduces themselves
- explain why you’re doing the interview
- reassure interviewee re: ethical issues
- ask interviewee if they mind being recorded
Warm-up session
- easy, nonthreatening questions
Main session
- questions presented in logical sequence
- probing questions at the end
- order may vary in semi-structured interview
Cooling-off period
- easy questions to defuse any tension
Closing session
- interviewer thanks interviewee
- switch off recorder or put notebook away

Observation

Users may be observed directly by the investigator as they perform their activities or indirectly through records of the activity that are studied afterward.

Observation can also result in a lot of data to sift through and can be complicated to do well than at firs appreciated. As such, a clearly stated goal is important to have focus for an observation session.

Example frameworks:

The person: Who is using the technology at any particular time?
The place: Where are they using it?
The thing: What are they doing with it?

3 common approaches

Simple observation: user is given a task, the evaluator just watches. This gives no insight into users’ decision process
Think aloud: subjects asked to say what they are thinking/doing. However, its hard to talk while concentrating and thinking may alter the way people naturally perform the task.
Co-discovery learning: two people work together on a task and normal conversation is monitored.

Degree of Participation

Passive Observer: observer who adopts an approach at the outsider end; does not take part in the study environment at all
Participant Observer: adopts an approach at the insider end; becomes a member of the group being studied

Coding Sheet

A data recording instrument in which a list of itemized coding options are structured

This standardizes observation practices which makes it more objective.

Consider:

evaluation goals (break it all down!)
stage of design
observation method/types of data
- what would potentially be an interesting finding from this particular style?
- e.g. for think-aloud, it might be good to record action vs spoken comments

Questionnaire

Survey vs Questionnaire: the questionnaire is a part of the survey. The questionnaire is just the concrete things you’re asking.

Pros

cheap
does not require presence of evaluator
many results can be quantified

Cons

preparation is “expensive” → need to design questions well
can have low response rate or low quality response
difficult to do in-depth “probing”

A questionnaire is good when motivation is high enough without anyone else present. If persuasion is needed, a structured interview is probably better

Designing a Questionnaire

Keep in mind

what info is sought?
how would you analyze results?
what audience do you want to reach?
what will you do with your analysis?

Don’t use vague questions, pilot the questionnaire before testing.

Questions

should not be transferable to other interfaces, can’t be interpreted in different ways depending on judgment (i.e. domain specific and clear wording)
avoid double-barreled questions and leading questions
to de-bias: neutral language, can have random order of questions for different participants
for validity
- use previously validated questionnaires
- triangulation (ask multiple questions about the same matter)
- piloting

Types of questions

Open-ended (hard to analyze rigorously)
Closed (easily analyzed but can be hard to interpret if not well-designed)
- checkboxes and ranges
  - range of answers to demo questions is predictable: offer a predefined list
  - interval doesn’t have to be equal in all cases, depends on what you want to know
  - mention how many boxes to check, be consistent with ascending/descending order
- Likert and semantic differential scale
  - used for measuring opinions, attitudes, and beliefs
  - widely used for evaluating user satisfaction
  - Likert: a five, seven, or nine-point agreement scale used to measure respondents’ agreement with various statements
  - semantic differential scale: rely on choosing pairs of adjectives to explore a range of bipolar attitudes about particular opinions
- ranked (closed)
  - respondent places ordering on items in a list
  - useful to indicate preferences
  - forced choice
- multi-choice (closed)
  - offered choice of explicit responses

Checklist

think about ordering of questions → impact can be influenced by order
consider if different versions are needed for different populations
provide clear instructions on how to complete questionnaire
- eg. if answers can be saved and completed later
think about length → avoid questions that don’t address study goals
consider allowing respondents to opt out at different stages especially if long → better to have some than none
think about layout and pacing

Data and Analysis

Subjective: what you were told what happened
Objective: what you captured using your senses
Quantitative: data in the form of numbers or data that can be easily translated into numbers. Focuses on ascertain magnitude, amount, or size of something
Qualitative: data in the form of words and images. Focuses on nature of something (themes, patterns, and stories)

Note that quantitative data is not always objective! Subjectivity can come from participants in how they express opinions or from investigators during the data capturing/interpreting/analysis process.

Similarly, it is unfair to try to quantize all qualitative data. This needs justification. Also be wary of translating small populatino sizes into percentages.

What to focus on

What are the most important needs/tasks to support?
What are the repeated patterns?
Key issues/areas that could be improved
What surprised you?
What is essential/nonessential in implementation

Steps

Initial reactions or observations (identify patterns, simple numerical analysis like averages, ratios, percentages)
Data cleansing (checking for erroneous entries and anomalies)
Analysis

Qualitative Analysis

Thematic Analysis

Themes are a small number of high-level patterns that answer your evaluation questions.

Going from codes (descriptive labels) to categories (grouping imposed on codes) to themes (interpretive patterns). Deductive analysis is just the inverse (starting at themes and arriving at codes)

Do an initial pass to check of internal consistency: make sure themes occur across several or all participants. Then, step back to see if an overarching narrative emerges from the themes. One can them remove themes or look into why there are conflicts.

One way of doing this is using affinity diagrams:

record each idea/observation/problem/etc on individual card or post-it notes
look for notes that seem to be related
sort notes into groups until all used
- sort and resort as necessary

Categorizing Data

scheme of data: code the data according to categories
- if analysis frame is chosen beforehand: deductive analysis
- if study is explanatory and it is important to let themes emerge from data: inductive analysis
can then analyze with appropriate methods like counting averages # of problems or identifying recurring patterns

Critical Incident Analysis

Helps identify significant subsets of data for more detailed analysis.

This is not about summarizing all incidients, more like finding gold nuggets. Incidents need not be bad all the time, can be either desirable or undesirable.

Potential way conclusions can be flawed:

Construct validity: are we measuring the right thing? Is this clearly connected to our research question? Did we misunderstand the concepts we are working with?
Internal validity: What are alternative explanations for the results? Other bias, confounding factors, etc.
External Validity: To what extent are our results and conclusions of our experiment generalizable to our original research question? (how representative are our tasks and users?)
Empirical Reliability / Reproducibility: Can the study be reproduced?

Risks and Consent:

In what ways could your participants could be harmed by the study or its results?
Could be physical harm (less likely in CS), emotional harm (stress, reputation, etc.)
Evaluate the likelihood of each potential risk (including unlikely cases)
Are there ways to mitigate these risks? Potentially: adjust your study design
What would you do if a participant were harmed? e.g. correction, compensation?

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control