Part of the series: How to research stuff.
The GSS is a huge collection of data on the demographics and attitudes of non-institutional adults (18+) living in the US.1 The data were collected by NORC via face-to-face, 90-minute interviews in randomly selected households, every year (almost) from 1972–1994, and every other year since then.
You can download the data and analyze it in R or SPSS or whatever, but the data can also be analyzed very easily via two easy-to-use web interfaces: the UC Berkeley SDA site and the GSS Data Explorer.2
Introducing the two web interfaces
Suppose you come across the claim that Americans have become less religious over time. Is that true?3 This is precisely the kind of thing the GSS surveys people about, so let’s look up the data on the Berkeley SDA site:
First, we want to find a variable that captures religiosity, or religious attendance, or belief in God, or something like that. Click the
Search button near the top, and we see this:
The search field works basically like you’d expect, but click Search Techniques Help if you want to check the allowed operators and syntax. Let’s search for
God OR religion OR religiosity OR "religious attendance":
After scrolling through several pages of results, some of the most promising variables for this purpose appear to be:
GOD: How confident is R (the respondent) that God exists?
RELSPRT: How religious/spiritual is R?
THEISM: How concerned with human affairs does R think God is?
IMPCHURH: How important are religion and church to R?
These are my own short paraphrases of the questions asked of each R. Click
VIEW next to a variable name to see the variable’s actual details:
You’ll notice that some of the details here are missing. For example some of the allowed answers are truncated, and there’s no information on which years this question was asked. For that you’ll want to download the latest GSS Codebook (47mb PDF). Search it for
VAR: RELSPRT and you’ll find the variable details:
Now we can see the non-truncated allowed answers. We can also see that, unfortunately,
RELSPRT was only collected during 2008.
This is one place where the GSS Data Explorer comes in handy. It’s a slower site in my experience, and it shows less information on each variable than the Berkeley SDA site does, but search for
God | religion | religiosity | "religious attendance" (the
| is used for
OR on this site) and you’ll notice a handy feature:
It shows you right away the available years of data for each variable!
If we scroll through the results, we see that
GOD was first collected in 1988,
RELSPRT doesn’t even appear (the Berkeley SDA variable search is more thorough),
THEISM was first collected in 1991, and
IMPCHURH was only collected once.
Luckily, two highly relevant variables stand out as being available for all years:
RELIG: What is R’s religious preference?
FUND: How fundamentalist is R?
Our first analysis: religious preference
We’ll do the analysis of
RELIG at the Berkeley SDA site because it’s quicker. But first, let’s make sure we understand our variable. Search the Codebook PDF for
VAR: RELIG and you’ll see:
Okay, good: there’s nothing weird to watch out for here. Respondents were asked a follow-up question if they responded with
Other, but we don’t need to concern ourselves with such details.
Analysis on the Berkeley SDA site and set
Run the Table and you’ll get a page with, among others things, a huge table:
The bold number in each cell is the
Column percent. Since our Column variable is
YEAR and our Row variable is RELIG, the percentage in each cell represents the percentage of respondents for that year which gave that row’s response to the query about their religious preference.
Glancing over the table quickly, we can see that:
- The vast majority of Rs (respondents) are
None. These three answers represented 95% of all answers in 1972, 95.1% of responses in 1993, and 89.3% of responses in 2014.
Protestantheld steady around 63% from 1972–1993, and since then has dropped steadily, ending at 43.2% in 2014.
Catholicheld steady around 25% throughout 1972–2014.
Nonehas increased more or less continuously during this period, rising from 5.1% in 1972 to 20.7% in 2014.
Unfortunately, the default visualization, a stacked bar chart, is hard to read with so many variables:
So let’s simplify things. We’ll collapse
Christian into a new variable called
Christian-Combined, we’ll keep
None as it is, and we’ll collapse everything else into a new variable called
Back at the
Analysis page, change
Row to this:
RELIG(r: 1-2, 10-11 "Christian-Combined"; 4 "None"; 3, 5-9, 12-13 "Other-Combined")
(For syntax help, see here.)
Run the Table, and voilà:
Okay, so, there’s a clear downward trend in American religiosity since about 1993, but Americans are still very religious, and in particular they are very Christian.
To make things even simpler, we could also combine
Other-Combined into a new variable called
Religious, and change
Non-Religious. To do this, change
RELIG(r: 1-3, 5-13 "Religious"; 4 "Non-Religious")
and run it again. Now we get:
Our second analysis: degree of fundamentalism
But there is another sense of “religiosity over time” we can explore, using that
FUND variable we found earlier. Let’s check what the GSS Codebook says, by searching it for
Ah, good thing we checked! The
REMARKS section says we should look at GSS Methodological Reports 43 and 56. You can find them on the NORC website here, by clicking
Next until you reach MR043 and MR056.
MR043 explains that FUND isn’t measured by asking R a question, as with
RELIG. Instead, FUND is automatically calculated based on the rules from Appendix 2 of MR043. For example:
- If R chooses
Catholicfor their religious preference, they are deemed
Moderatein their degree of fundamentalism (
- If R chooses
Nonefor their religious preference, they are deemed
Liberal, which just means “not fundamentalist.”
- If R chooses
Protestant, they are asked for their denomination (
DENOM). If R then chooses
Evangelical Congregational, they are deemed
Fundamentalist. But if R chooses
DENOM, they are deemed
Moderate. Or if they choose
Episcopal, they are deemed
And so on. Most of MR043 explains how those assignments in Appendix 2 were made.
MR056 notes that
FUND is one of 115 variables in the GSS data set affected by measurement variation over time. In Table 1 of MR056 we can see that
FUND is affected by measurement variation type
ID1, which Table 2 identifies as a variation in coding category subdivisions. Table 1 also says to see comment 13, which repeats some points on measurement variation described in more detail in MR043, in the section called “Classification Prior to 1984.” Basically, prior to 1984 “the major Protestant denominations were not delineated into their major sub-divisions (Baptists, Lutherans, Methodist, Presbyterians).”
At a glance, it looks like the appendix of MR056 may contain information on how to recode
FUND so that a time series analysis isn’t corrupted by the new 1984 subdivisions, but that’s probably more effort than this eyeball analysis is worth. To be safe, we’ll just leave out the pre-1985 data. Set
YEAR(1985-2014) and run the table:
Looking at the table and the stacked bar chart, it seems that since 1985:
- Fundamentalist Americans have declined from 34.6% to 23.2% of the population.
- Moderate Americans have increased slightly from 39.8% to 43.5% of the population.
- Liberal American have increased slightly from 25.6% to 33.4% of the population.
Remember, those who say they prefer no religion are among those labeled
Liberal. What happens if we look at trends in fundamentalism among religious Americans only? To do this, we’ll have to add a selection filter that excludes Rs who said their religious preference (
None, like this:
The result is:
So not only are Americans less religious than they used to be, but also religious Americans are less fundamentalist than they used to be. Interesting.
All we’ve done here is generate some time series and eyeball some tables and stacked bar charts. Obviously we can do a lot more with the GSS than this, but hopefully this is enough to give you a sense of how the GSS works and what online tools are out there for analyzing the data. I plan to write more GSS tutorials later.
- “Non-institutional” means: not in the military, not in jail or prison, and not in a nursing home. Another limitation of the data is that only English-speakers were interviewed until 2006, when Spanish-speakers were added to the target population. For further details on the data collection, see the GSS Codebook. [↩]
- In the future, check here to see whether Berkeley has added newer data files. [↩]
- Of course, you could also just check Wikipedia in this case, but my purpose is to teach you how to use the GSS. [↩]