"

4 “Content Analysis” (Chen et al., 2024)

Victor Tan Chen; Gabriela León-Pérez; Julie Honnold; and Volkan Aytar

Learning Objectives

  1. Define content analysis.
  2. Describe qualitative and quantitative strategies employed in content analysis.
  3. Understand how to present the results from content analysis.

Content analysis is a materials-based research method that focuses on texts and their meanings. Sociologists use a more expansive definition of “text” than the word typically has. In a research context, the content being analyzed is essentially any recorded communication. This would obviously include actual written copy, such as news articles or email messages, but we can consider content that we might see or hear—such as a speech, dance performance, television show, advertisement, or movie—to be “texts” as well. Table 15.1 provides some examples of the kinds of data that sociologists have studied using content analysis techniques.

Table 15.1. Examples of Content Analysis

Data

Research Question

Authors and Year of Publication

Spam emails

Is the frequency of apologies from advertisers in unsolicited emails different in the United States and Korea?

Park, Lee, and Song 2005

James Bond films

How are women portrayed in James Bond films, and what broader lessons can be drawn from these portrayals?

Neuendorf et al. 2010

Console video games

How is male and female sexuality portrayed in best-selling console video games?

Downs and Smith 2010

News articles

What kinds of framing contests emerged in the news about Colin Kaepernick’s NFL protests against racial oppression and police brutality?

Boykoff and Carrington 2020

Pro–eating disorder websites

What characteristics do websites that endorse and support eating disorders share, and what messages do they communicate to users?

Borzekowski et al. 2010

One thing you might notice is that the data sources described in this table are primary sources. As you may remember from Chapter 5: Research Design, primary sources are original works representing first-hand experiences, often written by individuals who were present at a noteworthy event or had relevant experiences. Primary sources that could be studied through content analysis include personal journals, emails, letters, government documents, speeches, television commercials, social media posts, and news articles published at the time of an event of interest.

Although content analysis usually focuses on primary sources, there are also examples of studies that use secondary sources (which draw upon primary sources for information, such as academic publications, biographies, and news articles that review other media or research) and tertiary sources (which summarize the results of secondary sources). With these sources, researchers might be interested in how the person or persons who generated the secondary or tertiary source reached their conclusions about the topic in question, or how their decisions about presenting information might shape people’s understandings.

For example, Myra Marx Ferree and Elaine Hall (1990) conducted a content analysis of introductory sociology textbooks to learn how students were being taught sociology. As part of their study, the researchers examined the images being presented to students and what messages they conveyed. They concluded that “women were not represented in numbers proportionate to their distribution in the population” and people of color “would have been numerically underrepresented” in images had the textbooks not included chapters specifically on race (Ferree and Hall 1990:529, 528).

[…]

Qualitative versus Quantitative Approaches to Content Analysis

Collage of photographs of six leading ladies (i.e., “Bond Girls”) from James Bond films.
Kimberly Neuendorf and her colleagues (2010) conducted a quantitative content analysis of 20 James Bond films to assess the portrayals of 195 female characters. MachoCarioca, via Wikimedia Commons

Content analysis can be quantitative or qualitative, and often researchers will use both strategies to strengthen their investigations.

Quantitative content analysis focuses on variables whose characteristics can be counted. For example, Kimberly Neuendorf and her colleagues (2010) reviewed 20 films featuring the fictional British spy James Bond. They examined the portrayals of women across the films—195 female characters in total—and tracked particular details about each character. For example, a variable they called “role prominence” assessed whether the female’s part was minor, medium, or major. Other variables counted the number of times weapons were used by and against a female character.

The approach in such a quantitative content analysis uses the same techniques we covered in Chapter 14: Quantitative Data Analysis. Neuendorf and her coauthors, for instance, calculated frequencies for their three categories of “role prominence,” finding that 52 percent of female roles were minor, 30 percent medium, and 17 percent major. They generated both univariate and multivariate statistics—for instance, determining that 25 percent of female characters had a weapon used against them, and their own use of weapons was related to their levels of sexual activity within the films.

In qualitative content analysis, the aim is to identify themes in the text and examine the underlying meanings of those themes. Tony Chambers and Ching-Hsiao Chiang (2012) used such an approach in their content analysis of open-ended comments in the National Survey of Student Engagement. First, the researchers identified passages in the comments where students appeared to be raising issues standing in the way of their academic success. Through the coding process (see Chapter 11: Qualitative Data Analysis), they settled on certain key themes across these passages, which could be broadly grouped into categories of academic needs, the campus environment, financial issues, and student services. Their approach was inductive, in that the researchers did not determine these themes beforehand, but rather allowed them to emerge during the coding process. Ultimately, this analysis allowed Chambers and Chiang to highlight certain disconnects between the goals of colleges and universities and students’ actual experiences.

Note that both quantitative and qualitative approaches can easily be used in the same content analysis. For example, Vaughn Crichlow and Christopher Fulcher (2017) analyzed coverage in the New York Times and USA Today about the deaths of three African American men while they were in police custody: Eric Garner, Michael Brown, and Freddie Gray. The content analysis focused on the quotes of experts (public officials and pundits) that appeared in these news articles. First, the researchers examined the quotes to generate themes, and then they tallied the numbers of quotes that fit the noteworthy themes they identified (as shown in this example, content analysis can move rather seamlessly from inductive qualitative analysis to deductive quantitative analysis). As shown in Figure 15.1, the analysis found that the experts quoted across these articles rarely discussed strategies to reduce police shootings or to improve police-community relations in communities of color.

Table indicating that 75 percent of experts’ quotes did not discuss reducing police shootings or improving police-community relations after Eric Garner’s death in Staten Island, 77 percent did not do so after Michael Brown’s death in Ferguson, and 75 percent did not do so after Freddie Gray’s death in Baltimore.
Figure 15.1. Expert Quotes about Police Shootings in the New York Times and USA Today: A Content Analysis by Crichlow and Fulcher (2017). Vaughn Crichlow and Christopher Fulcher used both qualitative and quantitative approaches in their content analysis of news articles about the deaths of three African American men while in police custody: Eric Garner in Staten Island, New York; Michael Brown in Ferguson, Missouri; and Freddie Gray in Baltimore. The researchers first qualitatively analyzed the quotes of experts across these articles to inductively identify a set of common themes, and then they went back to the data to count the number of instances when key themes appeared. The analysis depicted in the table shows that the vast majority of expert quotes following each of these police-involved deaths did not mention efforts to reduce police shootings or improve police-community relations (Crichlow and Fulcher 2017:176).

One important thing to note about content analysis is the critical importance of developing a comprehensive and detailed codebook. For the study of James Bond films, for instance, the eight coders assigned values to the study’s variables drawing on a codebook the authors had created well before any analysis began. Especially when many people are involved with coding, it’s important to establish a shared understanding of what exactly the codes mean to avoid divergent interpretations of the same data and maintain inter-coder reliability (a topic we will return to).

Defining the Scope of a Content Analysis

Photograph of a mural in Austin, Texas, that depicts Colin Kaepernick kneeling in front of images of George Floyd, Ahmaud Arbery, Eric Garner, Tamir Rice, Trayvon Martin, Breonna Taylor, and Mike Ramos.
Jules Boykoff and Ben Carrington (2020) used content analysis to examine media coverage of NFL player Colin Kaepernick’s 2016 kneeling protests against racialized oppression and police brutality. Lars Plougmann, via Flickr

For many studies that rely on content analysis, setting boundaries on what content to study is challenging because there is just so much content available to researchers. Consider a study by Jules Boykoff and Ben Carrington (2020) that analyzed media coverage of American football player Colin Kaepernick’s protests in 2016. By kneeling during the playing of the national anthem at numerous NFL games, Kaepernick sought to make a public statement about ongoing police brutality against African Americans. For their study of reactions to these protests, researchers examined “the media framing contests that emerged between Colin Kaepernick and his supporters on one side and his detractors on the other, including President Donald Trump” (Boykoff and Carrington 2020:832).

[…]

To make our content analysis feasible, we need to precisely define the scope conditions of our study. In Chapter 3: The Role of Theory in Research, we talked about how all theories have scope conditions, which tell us where the theory can and cannot be applied. Among other things, a theory is constrained (delimited) based on the contexts where it has been studied empirically. For instance, a theory developed from data in the United States may not necessarily apply in other societies. When we are designing an empirical study, we need to think about the reverse consideration—what can we feasibly study? How should we set the boundaries of our data collection? These are the scope conditions of our study.

For their study, Boykoff and Carrington decided to study print and online stories from newspapers—thereby reducing the vast category of “media” to the smaller (and dwindling) subcategory of “newspapers.” But which newspapers? In 2018, when Boykoff and Carrington were working on their analysis, there were 1,279 daily newspapers operating across the country. Even though this was fewer than in previous years, examining multiple articles in all of these publications would clearly have been an overwhelming task. Instead, the researchers decided to choose the four national newspapers with the highest circulation numbers—USA Today, the Wall Street Journal, the New York Times, and the Los Angeles Times—plus the seventh-ranked paper, the Washington Post. Their rationale for the first four had to do with the national reach and influence of these papers. They chose to include the lower-circulation Washington Post as well because of its in-depth coverage of issues and its numerous online articles.

For any content analysis, you should be specific in a similar way about your rationale for including or excluding certain sources. Always provide a clear and compelling justification—detailed at length in your paper’s methods section—for these methodological choices, rather than just saying these were the most convenient sources to analyze. Convenience can be an important consideration, but you should be able to speak thoughtfully about the benefits and drawbacks of the boundaries you set.

Even after Boykoff and Carrington settled on five newspaper sources to analyze, they still had another set of methodological decisions to make: what particular time period should they set for the Kaepernick-related articles they would analyze? Clearly there would be no coverage before the start of Kaepernick’s protests on August 14, 2016, so they could easily set a starting point for their analysis. The ideal ending point was not self-evident, however, given that newspapers obviously don’t coordinate with each other about when to stop coverage of a story. For their study, the researchers chose to examine news articles within a two-year period ending on August 14, 2018. By that time, they reasoned, the controversy had diminished, and no other NFL team had signed Kaepernick. This brings up another important point about setting scope conditions for a study: sometimes the decisions can be more or less arbitrary. While a two-year period is reasonable and justifiable, you could imagine how other researchers might make a decent case for a longer or shorter period of time. As a researcher, you will need to trust your own judgment and be able to defend your decisions, even if they are a matter of personal preference.

Now that Boykoff and Carrington had settled on a time period for their study, were they finally done? No. They still had to decide whether they should study all articles published during this period or be more selective. They decided to apply certain inclusion criteria to make it more likely that the articles they analyzed would contain material relevant to their research question of how the media framed Kaepernick’s actions. In their first sweep, they searched the archives of each newspaper for the terms “Kaepernick” and “protest.” Then they narrowed that list to just those articles that were at least five paragraphs long and that mentioned Kaepernick somewhere in the first five paragraphs of text, reasoning that these articles would be more likely to say something substantive about the controversy. By defining the scope of their content analysis in these ways, Boykoff and Carrington ensured that closely reviewing their materials was something they could do with a reasonable amount of time and effort. At the same time, they maximized the likelihood that their analysis would capture the vast majority of newspaper articles that put forward a useful (for their research purposes) perspective on the protests.

[..]

Coding in Quantitative Content Analysis

When researchers code a text for a content analysis, they are looking for two types of content: manifest and latent content. Manifest content has a more or less obvious significance to it. It presents a surface-level meaning that can be readily discerned. Latent content refers to the underlying meaning of the surface content we observe.

Coding manifest content typically involves a highly structured process—think of the precise ways, for example, that the study of James Bond films we mentioned earlier tallied the cases when women were targets of or wielders of weapons (Neuendorf et al. 2010). Qualitative researchers, however, are usually not interested in manifest content in itself, but rather use it as a stepping stone to gain deeper understanding, which they pursue with the coding of latent content.

Say that we are studying advertisements for kitchen utensils that have been published in magazines over the past several decades. One type of manifest content to code across these ads might be the stated purpose of each featured utensil. For instance, the text in the ads might emphasize how the utensil—say, a potato masher—makes preparing mashed potatoes a cinch for Thanksgiving dinners. Or it might focus instead on the general versatility and efficiency of that potato masher in dealing with russet, red, and Yukon Gold potatoes, or even the dreaded yam. We could also make note of how users of the utensils are depicted within the ads, using codes like “entertaining at home” when we see someone in the ad using the utensil to cook for a large gathering (refer to Chapter 11: Qualitative Data Analysis for a review of how to write up codes). In turn, one set of latent codes that could emerge from these manifest codes would be our assessment of the lifestyles that the ad—by playing up certain features of the utensil—is promoting.

For instance, we might see a shift over the years in the stated purpose of the utensils and the depictions of its users across the ads we are analyzing: from an emphasis on utensils designed to facilitate in-home entertaining to those designed to maximize efficiency and minimize time spent in the kitchen. We might theorize that this shift reflects a corresponding shift in how (and how much) people spend time in their homes. (See Video 15.1 for one take on this woefully understudied topic of kitchen utensils.)

Video 15.1. What Kitchen Utensils Say about Society. In the first part of this segment from the PBS documentary People Like Us, satirist Joe Queenan riffs on how kitchen utensils can serve as markers of social class—in essence, an argument that one might explore through the latent content of kitchen utensil advertisements, as we describe.

To record the observations we make during content analysis, we typically rely on a code sheet, sometimes referred to as a tally sheet. Code sheets allow us to apply a systematic approach to our analysis of the data.

For instance, let’s say we want to conduct a content analysis of kitchen utensils—this time, not the advertisements about them, but the utensils themselves (remember that any “text”—even physical objects—can be studied in a content analysis, so long as the object conveys meanings we can analyze.) We happen to have access to sales records for kitchen utensils over the past 50 years. Based on these records, we generate a list of 50 utensils, the top-selling utensils in each year. For each utensil, we use our code sheet (as shown in Table 15.2) to record its name, culinary purpose, and price in current dollar amounts (note that adjusting for inflation is crucial whenever you compare monetary amounts across years).

We might also want to make some qualitative assessments about each utensil and its purpose—say, how easy or hard it is to use. To rate this difficulty of use, we use a five-point scale, with 1 being very easy and 5 being very hard. The specific criteria we use to determine this difficulty of use should be described in our codebook, along with any other instructions for coding the study’s variables. (For space reasons, the sample sheet contains columns only for 10 years’ worth of utensils; if we were to conduct this project—and who wouldn’t want to learn more about the history of kitchen utensils?—we’d need columns for each of the 50 items in our sample.)

Table 15.2. Sample Code Sheet for Study of Kitchen Utensil Popularity over Time
  1970 1971 1972 1973 1974 1975 1976
Utensil name
Utensil purpose
Price (in current dollars)
Ease of use (1 being easy, 5 being hard)
Other notes

As our example shows, a code sheet can contain both qualitative and quantitative data. For instance, our ease of use row will report our difficulty rating for each utensil, a quantitative assessment. We will be able to analyze the data recorded in this row using statistical procedures of the sort outlined in Chapter 14: Quantitative Data Analysis—say, by calculating the mean value of “ease of use” for each of the five decades we are observing. We will be able to do the same thing with the data collected in the “price” row, which is also a quantitative measure.

The final row of our example code sheet will contain qualitative data: notes about our impressions of the utensils we are examining. For the data in this row, conducting open and focused coding (as described in Chapter 11: Qualitative Data Analysis) is an option. But regardless of whether we are analyzing qualitative or quantitative data, our goal will be the same: identifying patterns across our data.

HOW TO CODE

Let’s walk through how you should prepare to code content for a study, drawing as needed on examples from the content analysis used for the James Bond films (Neuendorf et al. 2010), which involved eight coders.

  1. Determine the variables to be measured in the analysis. One of the research questions in the content analysis of James Bond films was, “Is the amount of aggression aimed at a female character predicted by her physical characteristics, role prominence, sexual activity, and aggressive predispositions?” (Neuendorf 2010:750). Some of the variables in this question that needed to be measured were:
    • If a person appears in the film, is that person a character?
    • If the person is a character, what is their gender?
    • If the person is a female character, how much aggression is aimed at her?
    • If the person is a female character, what is the nature of her physical characteristics?
  2. Determine the possible attributes of each variable. Remember that attributes are the different categories for a given variable. In a content analysis, these are the options that coders can choose from. Specifically, they will code particular snippets of data as reflecting those attributes. For simple variables, this process of identifying attributes is pretty straightforward. In the Bond study, the “role prominence” variable—which indicated how major or minor a female character was in regards to events of the film overall—had just three attributes: minor, medium, or major. More complex multidimensionalconcepts, however, may require multiple variables to measure them. For example, the “physical characteristics” variable we just described needed to be measured using eight variables, including “hair color,” “hair length,” “glasses,” “body type,” and “level of attractiveness of physical appearance.”
  3. Develop a coding manual providing precise instructions for coding. The coding manual should describe all of the study’s variables and give coders a complete understanding of how to make decisions during the coding process. This step is of paramount importance when multiple coders are used, which was the case in the Neuendorf study. (Figure 15.2 shows actual instructions from their full coding manual.) Even with just one coder, however, creating a set of clear instructions will promote consistency throughout the coding process.
  4. Test the coding instructions on similar materials to revise your coding manual and train multiple coders. Before you dive into your content analysis of your sources, it’s a good idea to take materials that are somehow comparable to those sources and try out your variable definitions on that content. You can then refine your coding manual based on the outcome of that trial run. Testing the instructions is especially useful when you have multiple coders involved. That testing can serve as part of the training for coders, in addition to whatever informal or formal instruction you wish to give them concerning the coding manual. You can also use the test to check the level of agreement among coders. (In the section Reliability and Validity in Content Analysis, we’ll discuss how you calculate the inter-coder reliability of the coding process for a given study.) For instance, the Neuendorf study was able to evaluate the level of agreement among its eight coders by using a Bond film (Never Say Never Again, 1983) that was not in the actual study list because it was based on an earlier Bond film (Thunderball, 1961). This pilot test prompted the researchers to make several changes to the coding manual and have the coders attend additional training sessions. They conducted their final training using the 1967 version of Casino Royale, which featured James Bond as a character but had not been included on the study list because it was a spy parody not produced by the same company that produced the other Bond films.
    For this study, the researchers were lucky to have two comparable films at their disposal that they could use to test their measurement procedures. If you find it hard to find a similar set of materials for such purposes, however, you can conduct a test on a source you intend to include in your study: do the trial run to refine your coding manual, delete the data, and then code the same film with the revised instructions.
Excerpt from the codebook created by Neuendorf et al. (2010) for their content analysis of James Bond movies.
Figure 15.2. Coding Instructions from Neuendorf et al. (2010). Before they began their analysis of the portrayals of women in James Bond films, Neuendorf and her colleagues created a 13-page codebook to guide their team of eight coders. As shown in the excerpt, the coding manual provided detailed instructions about how to decide which of a variable’s attributes (categories) were being reflected in the data being analyzed (Neuendorf et al. 2010:4–5). Springer

Coding in Qualitative Content Analysis

Coding in qualitative content analysis is typically an inductive process. That is, researchers do not start out by precisely specifying relevant variables and attributes and creating detailed coding procedures. Instead, they let their ideas regarding conceptualization and operationalization emerge from a careful reading and consideration of the text (see Chapter 4: Research Questions for a fuller discussion of the inductive approach to analysis). In this and other ways, coding in qualitative content analysis follows much the same procedures that are used for other qualitative research methods, such as ethnographic observation and in-depth interviews.

That said, keep in mind the earlier distinction we made between quantitative content analysis, which focuses on manifest content, and qualitative content analysis, which focuses on latent content. Rather than just counting more obvious details, qualitative content analysis tends to delve more deeply into the data. The researchers immerse themselves in the text through careful reading, trying to get at its underlying meanings.

Sociologist Nikita Carney (2016) analyzed the Twitter debates that occurred after the police shot and killed two unarmed African American men, Michael Brown and Eric Garner, in 2014. In her paper she describes the straightforward qualitative data analysis process she followed:

I decided to use Twitter’s advanced search feature and take screenshots of selected results between December 3 and 7, 2014. The analysis process drew heavily from grounded theory [an inductive approach] in order to identify key themes. . . . I initially read through approximately 500 tweets from this time period to get a sense for the dialogue on Twitter at this moment in time. Based on this initial read-through, I loosely coded tweets based on whether they used ‘‘#BlackLivesMatter,’’ ‘‘#AllLivesMatter,’’ or both. I selected 100 tweets out of the initial sample of 500, consisting of approximately 30 to 35 tweets from each initial grouping that were representative of the larger sample. I conducted a close textual analysis on these 100 tweets, from which I developed more specific thematic groupings, including ‘‘call to action,’’ ‘‘conflict over signs,’’ and ‘‘shifting signs/discourse.’’ (Carney 2016:188–9)

Just as different people who read the same book will not necessarily have the same interpretation of the text, closely studying latent content is necessarily a subjective process. Qualitative researchers recognize this and often disclose their personal stances toward the research, in a process known as reflexivity (discussed in Chapter 9: Ethnography). Carney’s paper includes a “personal reflexive statement” to this effect:

I closely followed news surrounding the deaths of Michael Brown and Eric Garner, among other victims of police violence, on mass media and social media. I also took to the streets with other activists and participated in acts of protest at my university on a daily basis. Rather than claiming to produce an ‘‘objective’’ analysis, I use my subjectivity to examine discourse as it unfolded on social media with the goal of better understanding the ways in which youth of color used technology to influence dominant discourse in the nation. (Carney 2016:181)

Although content analysis often focuses on either latent or manifest content, the two approaches can be combined in one study.

For instance, Kathleen Denny’s content analysis of scouting handbooks examined manifest content—the descriptions of badge activities that could be categorized as “self-oriented” or “others-oriented”—and calculated which proportion of activities for the Boy Scouts and Girl Scouts fell into each category. But Denny also analyzed latent content in terms of how the handbooks portrayed gender. Based on this analysis, Denny (2011:27) argued that the girls were encouraged to become “up-to-date traditional women,” whereas boys were urged to adopt “an assertive heteronormative masculinity.” In her paper, Denny described the qualitative and inductive approach she took to arrive at this finding:

Rather than code the texts for the presence or absence of any particular trait, I assessed them holistically, attuned to themes and patterns that emerged having to do with the attitude or approach endorsed by the texts. I performed textual analyses of the girls’ and boys’ handbooks’ official statements as well as a focused comparison of a comparable pair of badges—the girls’ Model Citizen badge and the boys’ Citizen badge. I present findings from the comparison of the citizen badges because the nature of the activities offered in these badges is very similar, bringing gender differences into sharper relief. (Denny 2011:35)

[…]

  1. Content analysis focuses on the study of recorded communications. The materials that can be analyzed include actual written texts, such as newspapers or journal entries, as well as visual and auditory sources, such as television shows, advertisements, or movies.
  2. Content analysis usually focuses on primary sources, though in some instances it may also involve secondary sources.
  3. Quantitative content analysis tends to count instances of manifest content, surface-level details about the sources. Qualitative content analysis tends to focus on the underlying meanings of the text, its latent content.
  4. Code sheets are used to collect and organize data for content analysis.
  5. Quantitative content analyses often present univariate and bivariate statistics in the form of tables and charts. Much like in-depth interviewing, qualitative content analyses tend to use direct or paraphrased quotes in order to describe meanings and explain relationships in the data.

Original source: 15.1. Content Analysis – The Craft of Sociological Research (pressbooks.pub)

definition

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

"Content Analysis" (Chen et al., 2024) Copyright © by Victor Tan Chen; Gabriela León-Pérez; Julie Honnold; and Volkan Aytar is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.