This language map provides insight into language diversity in the United States and Puerto Rico (all 50 states, Washington D.C., and Puerto Rico).
There are two different geographic areas used: states and PUMAs (Public Use Microdata Areas). PUMAs are contained within states, are built on census tracts and counties, and have a population of roughly 100,000 people. PUMAs are redefined every 10 years with the U.S. Census.
The dataset for this project is the made up of Public Use Microdata Sample (PUMS) data from American Community Survey (ACS) 1-year PUMS files made available by the Census Bureau.
For more information or to contribute to the project or report an issue, please see the GitHub repo.
What does this map show?
This map shows the number of speakers of languages other than English spoken at home in the United States. For more information, see Why We Ask Questions About... Language Spoken at Home
Why a language map?
This map hopes to educate the general public about language diversity (different languages spoken) in the United States. It could also potentially be used by educators to teach students about language diversity or even by policy makers to better understand where speakers of certain languages reside.
Where did the data come from?
This project uses public data collected and made available by the US Census Bureau, specifically from Public Use Microdata Sample (PUMS) data.
What are the geographic subdivisions shown within states?
They are called PUMAs (Public Use Microdata Areas). PUMAs are contained within states, are built on census tracts and counties, and contain about 100,000 people.
Why PUMAs? Why not counties?
PUMAs are the smallest geographic area for which this ACS data is made available, and thus provides the most granularity (more than states or counties). Furthermore, PUMAs by definition contain about 100,000 people which, unlike counties, gives them the nice property of reflecting population density; the smaller the PUMAs, the more dense the population in that area. With PUMAs, we can tell at a glance how many speakers there are in a given area.
What do the percentages mean?
The percentages represent the percentage of the population that speaks that language within that area.
Why are the ranges of percentages of speakers so tiny?
The map uses a logarithmic scale which means that every "bucket" of speakers represents 10x more speakers than the next smallest one. The reason for this is because there is a very wide range in the number of speakers between languages. Spanish, the most widely spoken non-English language in the United States, is spoken at home by the vast majority of people in some areas (see Miami-Dade county in south Florida). By contrast, the Shona language is estimated to be spoken at home by a total of only 9510 people. The advantage of using a logarithmic scale is that this difference remains discernible on the map, not just for these two extremes but for all of the other not-very-widely-spoken languages in between. The disadvantage is that you lose granularity with widely-spoken languages like Spanish since the same color is used to represent everything from 10% to 100%.
How is the language data collected?
As per the Census Bureau:
The American Community Survey (ACS) collects data on whether or not people five years old or older speak a language other than English at home. If a respondent indicates speaking a language other than English, the ACS asks what language the person speaks and how well the person speaks English.See also: Frequently Asked Questions (FAQs) About Language Use
Why is data not available for all languages for all years?
The ACS changed the available language options as of 2016 in order to conform to a standard list of languages as per ISO-639-3. This means that certain years cannot be selected for some languages and certain languages are not comparable across the 2016 year threshold.
Where can I see the code?
Please see the GitHub repo.