This language map of the United States provides insight into multilingualism and language use in the United States.
There are two different geographic areas used: states and PUMAs (Public Use Microdata Areas). PUMAs are contained within states, are built on census tracts and counties, and contain at least 100,000 people.
The dataset for this project is the 2012-2016 American Community Survey (ACS) 5-year Public Use Microdata Sample (PUMS) made available by the Census Bureau.
For more information or to contribute to the project or report an issue, please see the GitHub repo.
What does this map show?
This map shows the number of speakers of different languages (other than English) spoken at home in the United States.
Why a language map?
This map aims to inform the general public about multilingualism and language use in the United States. It can also be used by educators to teach students about linguistic diversity or even by policy makers to better understand where speakers of certain languages reside.
Where did the data come from?
This project uses public data collected and made available by the US Census Bureau, specifically from the 2012-2016 American Community Survey (ACS) 5-year Public Use Microdata Sample (PUMS).
What are the geographic subdivisions shown within states?
They are called PUMAs (Public Use Microdata Areas). PUMAs are contained within states, are built on census tracts and counties, and contain at least 100,000 people.
Why PUMAs? Why not counties?
PUMAs are the smallest geographic area for which this ACS data is made available, and thus provides the most granularity (more than states or counties). Furthermore, PUMAs by definition contain at least 100,000 people which, unlike counties, gives them the nice property of reflecting population density; the smaller the PUMAs, the more dense the population in that area. With PUMAs, we can tell at a glance how many speakers there are in a given area.
What do the percentages mean?
The percentages represent the percentage of the population that speaks that language (given any filters that are applied) within that area.
Why are the ranges of percentages of speakers so tiny?
The map uses a logarithmic scale which means that every "bucket" of speakers represents 10x more speakers than the next smallest one. The reason for this is because there is a very wide range in the number of speakers between languages. Spanish, the most widely spoken non-English language in the United States, is spoken at home by the vast majority of people in some areas (see Miami-Dade county in south Florida). By contrast, the Shona language is estimated to be spoken at home by a total of only 9510 people. The advantage of using a logarithmic scale is that this difference remains discernible on the map, not just for these two extremes but for all of the other not very widely spoken languages in between. The disadvantage is that you lose granularity with widely spoken languages like Spanish since the same color is used to represent everything from 10% to 100%.
Where can I see the code?
Please see the GitHub repo.