GSS Tutorial #1: Basic trends over time

Part of the series: How to research stuff.

Today I join Razib Khan’s quest to get bloggers to use the General Social Survey (GSS) more often.

The GSS is a huge collection of data on the demographics and attitudes of non-institutional adults (18+) living in the US.1 The data were collected by NORC via face-to-face, 90-minute interviews in randomly selected households, every year (almost) from 1972–1994, and every other year since then.

You can download the data and analyze it in R or SPSS or whatever, but the data can also be analyzed very easily via two easy-to-use web interfaces: the UC Berkeley SDA site and the GSS Data Explorer.2

Introducing the two web interfaces

Suppose you come across the claim that Americans have become less religious over time. Is that true?3 This is precisely the kind of thing the GSS surveys people about, so let’s look up the data on the Berkeley SDA site:

Berkeley GSS14

First, we want to find a variable that captures religiosity, or religious attendance, or belief in God, or something like that. Click the Search button near the top, and we see this:

GSS14 variable search

The search field works basically like you’d expect, but click Search Techniques Help if you want to check the allowed operators and syntax. Let’s search for God OR religion OR religiosity OR "religious attendance":

GSS14 search results for God etc

After scrolling through several pages of results, some of the most promising variables for this purpose appear to be:

  • GOD: How confident is R (the respondent) that God exists?
  • RELSPRT: How religious/spiritual is R?
  • THEISM: How concerned with human affairs does R think God is?
  • IMPCHURH: How important are religion and church to R?

These are my own short paraphrases of the questions asked of each R. Click VIEW next to a variable name to see the variable’s actual details:

GSS14 RELSPRT details

You’ll notice that some of the details here are missing. For example some of the allowed answers are truncated, and there’s no information on which years this question was asked. For that you’ll want to download the latest GSS Codebook (47mb PDF). Search it for VAR: RELSPRT and you’ll find the variable details:

GSS14 RELSPRT details in Codebook

Now we can see the non-truncated allowed answers. We can also see that, unfortunately, RELSPRT was only collected during 2008.

This is one place where the GSS Data Explorer comes in handy. It’s a slower site in my experience, and it shows less information on each variable than the Berkeley SDA site does, but search for God | religion | religiosity | "religious attendance" (the | is used for OR on this site) and you’ll notice a handy feature:

GSS14 Data explorer variable results

It shows you right away the available years of data for each variable!

If we scroll through the results, we see that GOD was first collected in 1988, RELSPRT doesn’t even appear (the Berkeley SDA variable search is more thorough), THEISM was first collected in 1991, and IMPCHURH was only collected once.

Luckily, two highly relevant variables stand out as being available for all years:

  1. RELIG: What is R’s religious preference?
  2. FUND: How fundamentalist is R?

Our first analysis: religious preference

We’ll do the analysis of RELIG at the Berkeley SDA site because it’s quicker. But first, let’s make sure we understand our variable. Search the Codebook PDF for VAR: RELIG and you’ll see:

GSS14 RELIG details in Codebook

Okay, good: there’s nothing weird to watch out for here. Respondents were asked a follow-up question if they responded with Protestant or Other, but we don’t need to concern ourselves with such details.

Click Analysis on the Berkeley SDA site and set Row to RELIG and Column to YEAR:

GSS14 close up on crosstab analysis

Click Run the Table and you’ll get a page with, among others things, a huge table:

GSS14 relig table

The bold number in each cell is the Column percent. Since our Column variable is YEAR and our Row variable is RELIG, the percentage in each cell represents the percentage of respondents for that year which gave that row’s response to the query about their religious preference.

Glancing over the table quickly, we can see that:

  1. The vast majority of Rs (respondents) are Protestant, Catholic, or None. These three answers represented 95% of all answers in 1972, 95.1% of responses in 1993, and 89.3% of responses in 2014.
  2. Protestant held steady around 63% from 1972–1993, and since then has dropped steadily, ending at 43.2% in 2014.
  3. Catholic held steady around 25% throughout 1972–2014.
  4. None has increased more or less continuously during this period, rising from 5.1% in 1972 to 20.7% in 2014.

Unfortunately, the default visualization, a stacked bar chart, is hard to read with so many variables:

GSS14 RELIG stacked chart

So let’s simplify things. We’ll collapse Protestant, Catholic, Orthodox-Christian, and Christian into a new variable called Christian-Combined, we’ll keep None as it is, and we’ll collapse everything else into a new variable called Other-Combined.

Back at the Analysis page, change Row to this:

RELIG(r: 1-2, 10-11 "Christian-Combined"; 4 "None"; 3, 5-9, 12-13 "Other-Combined")

(For syntax help, see here.)

Click Run the Table, and voilà:

GSS14 RELIG stacked chart with recoded variables

Okay, so, there’s a clear downward trend in American religiosity since about 1993, but Americans are still very religious, and in particular they are very Christian.

To make things even simpler, we could also combine Christian-Combined and Other-Combined into a new variable called Religious, and change None to Non-Religious. To do this, change Row to

RELIG(r: 1-3, 5-13 "Religious"; 4 "Non-Religious")

and run it again. Now we get:

GSS14 RELIG stacked chart with recoded variables 2


Our second analysis: degree of fundamentalism

But there is another sense of “religiosity over time” we can explore, using that FUND variable we found earlier. Let’s check what the GSS Codebook says, by searching it for VAR: FUND:

GSS14 FUND variable details

Ah, good thing we checked! The REMARKS section says we should look at GSS Methodological Reports 43 and 56. You can find them on the NORC website here, by clicking Next until you reach MR043 and MR056.

MR043 explains that FUND isn’t measured by asking R a question, as with RELIG. Instead, FUND is automatically calculated based on the rules from Appendix 2 of MR043. For example:

  • If R chooses Catholic for their religious preference, they are deemed Moderate in their degree of fundamentalism (FUND).
  • If R chooses None for their religious preference, they are deemed Liberal, which just means “not fundamentalist.”
  • If R chooses Protestant, they are asked for their denomination (DENOM). If R then chooses Evangelical Congregational, they are deemed Fundamentalist. But if R chooses Methodist for DENOM, they are deemed Moderate.  Or if they choose Episcopal, they are deemed Liberal.

And so on. Most of MR043 explains how those assignments in Appendix 2 were made.

MR056 notes that FUND is one of 115 variables in the GSS data set affected by measurement variation over time. In Table 1 of MR056 we can see that FUND is affected by measurement variation type ID1, which Table 2 identifies as a variation in coding category subdivisions. Table 1 also says to see comment 13, which repeats some points on measurement variation described in more detail in MR043, in the section called “Classification Prior to 1984.” Basically, prior to 1984 “the major Protestant denominations were not delineated into their major sub-divisions (Baptists, Lutherans, Methodist, Presbyterians).”

At a glance, it looks like the appendix of MR056 may contain information on how to recode FUND so that a time series analysis isn’t corrupted by the new 1984 subdivisions, but that’s probably more effort than this eyeball analysis is worth. To be safe, we’ll just leave out the pre-1985 data. Set Row to FUND and Column to YEAR(1985-2014) and run the table:

GSS14 FUND table

GSS14 FUND stacked bar chart

Looking at the table and the stacked bar chart, it seems that since 1985:

  • Fundamentalist Americans have declined from 34.6% to 23.2% of the population.
  • Moderate Americans have increased slightly from 39.8% to 43.5% of the population.
  • Liberal American have increased slightly from 25.6% to 33.4% of the population.

Remember, those who say they prefer no religion are among those labeled Liberal. What happens if we look at trends in fundamentalism among religious Americans only? To do this, we’ll have to add a selection filter that excludes Rs who said their religious preference (RELIG) was None, like this:

GSS14 FUND crosstab with selection filter

The result is:

GSS14 FUND table excluding non-religious

GSS14 FUND stacked bar chart excluding non-religious

So not only are Americans less religious than they used to be, but also religious Americans are less fundamentalist than they used to be. Interesting.


All we’ve done here is generate some time series and eyeball some tables and stacked bar charts. Obviously we can do a lot more with the GSS than this, but hopefully this is enough to give you a sense of how the GSS works and what online tools are out there for analyzing the data. I plan to write more GSS tutorials later.

  1. “Non-institutional” means: not in the military, not in jail or prison, and not in a nursing home. Another limitation of the data is that only English-speakers were interviewed until 2006, when Spanish-speakers were added to the target population. For further details on the data collection, see the GSS Codebook. {}
  2. In the future, check here to see whether Berkeley has added newer data files. {}
  3. Of course, you could also just check Wikipedia in this case, but my purpose is to teach you how to use the GSS. {}

Leave a Reply

Your email address will not be published. Required fields are marked *