Part of the series: How to research stuff.
Today I join Razib Khan’s quest to get bloggers to use the General Social Survey (GSS) more often.
The GSS is a huge collection of data on the demographics and attitudes of non-institutional adults (18+) living in the US. 1 The data were collected by NORC via face-to-face, 90-minute interviews in randomly selected households, every year (almost) from 1972–1994, and every other year since then.
You can download the data and analyze it in R or SPSS or whatever, but the data can also be analyzed very easily via two easy-to-use web interfaces: the UC Berkeley SDA site and the GSS Data Explorer. 2
Introducing the two web interfaces
Suppose you come across the claim that Americans have become less religious over time. Is that true? 3 This is precisely the kind of thing the GSS surveys people about, so let’s look up the data on the Berkeley SDA site:
First, we want to find a variable that captures religiosity, or religious attendance, or belief in God, or something like that. Click the Search
button near the top, and we see this:
The search field works basically like you’d expect, but click Search Techniques Help if you want to check the allowed operators and syntax. Let’s search for God OR religion OR religiosity OR "religious attendance"
:
After scrolling through several pages of results, some of the most promising variables for this purpose appear to be:
GOD
: How confident is R (the respondent) that God exists?RELSPRT
: How religious/spiritual is R?THEISM
: How concerned with human affairs does R think God is?IMPCHURH
: How important are religion and church to R?
These are my own short paraphrases of the questions asked of each R. Click VIEW
next to a variable name to see the variable’s actual details:
You’ll notice that some of the details here are missing. For example some of the allowed answers are truncated, and there’s no information on which years this question was asked. For that you’ll want to download the latest GSS Codebook (47mb PDF). Search it for VAR: RELSPRT
and you’ll find the variable details:
Now we can see the non-truncated allowed answers. We can also see that, unfortunately, RELSPRT
was only collected during 2008.
This is one place where the GSS Data Explorer comes in handy. It’s a slower site in my experience, and it shows less information on each variable than the Berkeley SDA site does, but search for God | religion | religiosity | "religious attendance"
(the |
is used for OR
on this site) and you’ll notice a handy feature:
It shows you right away the available years of data for each variable!
If we scroll through the results, we see that GOD
was first collected in 1988, RELSPRT
doesn’t even appear (the Berkeley SDA variable search is more thorough), THEISM
was first collected in 1991, and IMPCHURH
was only collected once.
Luckily, two highly relevant variables stand out as being available for all years:
RELIG
: What is R’s religious preference?FUND
: How fundamentalist is R?
Our first analysis: religious preference
We’ll do the analysis of RELIG
at the Berkeley SDA site because it’s quicker. But first, let’s make sure we understand our variable. Search the Codebook PDF for VAR: RELIG
and you’ll see:
Okay, good: there’s nothing weird to watch out for here. Respondents were asked a follow-up question if they responded with Protestant
or Other
, but we don’t need to concern ourselves with such details.
Click Analysis
on the Berkeley SDA site and set Row
to RELIG
and Column
to YEAR
:
Click Run the Table
and you’ll get a page with, among others things, a huge table:
The bold number in each cell is the Column percent
. Since our Column variable is YEAR
and our Row variable is RELIG, the percentage in each cell represents the percentage of respondents for that year which gave that row’s response to the query about their religious preference.
Glancing over the table quickly, we can see that:
- The vast majority of Rs (respondents) are
Protestant
,Catholic
, orNone
. These three answers represented 95% of all answers in 1972, 95.1% of responses in 1993, and 89.3% of responses in 2014. Protestant
held steady around 63% from 1972–1993, and since then has dropped steadily, ending at 43.2% in 2014.Catholic
held steady around 25% throughout 1972–2014.None
has increased more or less continuously during this period, rising from 5.1% in 1972 to 20.7% in 2014.
Unfortunately, the default visualization, a stacked bar chart, is hard to read with so many variables:
So let’s simplify things. We’ll collapse Protestant
, Catholic
, Orthodox-Christian
, and Christian
into a new variable called Christian-Combined
, we’ll keep None
as it is, and we’ll collapse everything else into a new variable called Other-Combined
.
Back at the Analysis
page, change Row
to this:
RELIG(r: 1-2, 10-11 "Christian-Combined"; 4 "None"; 3, 5-9, 12-13 "Other-Combined")
(For syntax help, see here.)
Click Run the Table
, and voilà:
Okay, so, there’s a clear downward trend in American religiosity since about 1993, but Americans are still very religious, and in particular they are very Christian.
To make things even simpler, we could also combine Christian-Combined
and Other-Combined
into a new variable called Religious
, and change None
to Non-Religious
. To do this, change Row
to
RELIG(r: 1-3, 5-13 "Religious"; 4 "Non-Religious")
and run it again. Now we get:
Our second analysis: degree of fundamentalism
But there is another sense of “religiosity over time” we can explore, using that FUND
variable we found earlier. Let’s check what the GSS Codebook says, by searching it for VAR: FUND
:
Ah, good thing we checked! The REMARKS
section says we should look at GSS Methodological Reports 43 and 56. You can find them on the NORC website here, by clicking Next
until you reach MR043 and MR056.
MR043 explains that FUND isn’t measured by asking R a question, as with RELIG
. Instead, FUND is automatically calculated based on the rules from Appendix 2 of MR043. For example:
- If R chooses
Catholic
for their religious preference, they are deemedModerate
in their degree of fundamentalism (FUND
). - If R chooses
None
for their religious preference, they are deemedLiberal
, which just means “not fundamentalist.” - If R chooses
Protestant
, they are asked for their denomination (DENOM
). If R then choosesEvangelical Congregational
, they are deemedFundamentalist
. But if R choosesMethodist
forDENOM
, they are deemedModerate
. Or if they chooseEpiscopal
, they are deemedLiberal
.
And so on. Most of MR043 explains how those assignments in Appendix 2 were made.
MR056 notes that FUND
is one of 115 variables in the GSS data set affected by measurement variation over time. In Table 1 of MR056 we can see that FUND
is affected by measurement variation type ID1
, which Table 2 identifies as a variation in coding category subdivisions. Table 1 also says to see comment 13, which repeats some points on measurement variation described in more detail in MR043, in the section called “Classification Prior to 1984.” Basically, prior to 1984 “the major Protestant denominations were not delineated into their major sub-divisions (Baptists, Lutherans, Methodist, Presbyterians).”
At a glance, it looks like the appendix of MR056 may contain information on how to recode FUND
so that a time series analysis isn’t corrupted by the new 1984 subdivisions, but that’s probably more effort than this eyeball analysis is worth. To be safe, we’ll just leave out the pre-1985 data. Set Row
to FUND
and Column
to YEAR(1985-2014)
and run the table:
Looking at the table and the stacked bar chart, it seems that since 1985:
- Fundamentalist Americans have declined from 34.6% to 23.2% of the population.
- Moderate Americans have increased slightly from 39.8% to 43.5% of the population.
- Liberal American have increased slightly from 25.6% to 33.4% of the population.
Remember, those who say they prefer no religion are among those labeled Liberal
. What happens if we look at trends in fundamentalism among religious Americans only? To do this, we’ll have to add a selection filter that excludes Rs who said their religious preference (RELIG
) was None
, like this:
The result is:
So not only are Americans less religious than they used to be, but also religious Americans are less fundamentalist than they used to be. Interesting.
Conclusion
All we’ve done here is generate some time series and eyeball some tables and stacked bar charts. Obviously we can do a lot more with the GSS than this, but hopefully this is enough to give you a sense of how the GSS works and what online tools are out there for analyzing the data. I plan to write more GSS tutorials later.
Footnotes:- “Non-institutional” means: not in the military, not in jail or prison, and not in a nursing home. Another limitation of the data is that only English-speakers were interviewed until 2006, when Spanish-speakers were added to the target population. For further details on the data collection, see the GSS Codebook.[↩]
- In the future, check here to see whether Berkeley has added newer data files.[↩]
- Of course, you could also just check Wikipedia in this case, but my purpose is to teach you how to use the GSS.[↩]