A common inquiry in demographic studies involves comparing various zones within metropolitan statistical areas (MSAs) regarding racial diversity and segregation. For example, how do racial diversity and segregation change between suburban areas and a principal city?
Here, we demonstrate how to conduct such analysis using the NRGD2020
dataset and the R package raceland
. Our case study pertains
to the Atlanta metropolitan statistical area (MSA) in 2020.
The code below can be replicated using data from the Atlanta dataset.
Here we also describe what NRGD2020 layers and preprocessing steps are required to perform the calculations.
NRGD2020 layers: (can be downloaded from http://www.socscape.edu.pl/index.php?id=nrgd)
vector boundaries of principal city, MSA and suburban areas
Data preparation consists of cropping RL grid layers (RL grid racial ID and RL grid population density) to the area of MSA, principal city, and suburban areas. Due to the size of RL grids, data should be prepare using GIS software, instead of R.
In our example, we use typology of suburban areas introduced by Lichter et al (2023)). For each MSA, they divided MSA into principal city and three suburban zones. Suburban zones are classified based on counties.
Example of such data for the Atlanta, MSA can be dowloaded from http://www.socscape.edu.pl/index.php?id=nrgd) The zip archive contains RL grid racial ID and RL grid population density layer for Atlanta MSA, and for 4 zones within MSA.
The code below shows how to calculate racial diversity and
segregation metrics based on RL grid racial ID and RL grid
population density using raceland
package. We use
Atlanta MSA as an example.
#read package
library(raceland)
library(terra)
#set working directory to the folder containing data using function setwd("")
##setwd("")
#read RL grid racial ID to R using terra package
= rast("Atlanta_dataset/RL_grids/rl_msa.tif")
rl #read RL grid population density to R using terra package
= rast("Atlanta_dataset/RL_grids/rd_msa.tif")
rd #calculate diversity and segregation metrics using raceland package
= calculate_metrics(rl, rd, fun = "mean", threshold = 1)
metr #return metrics in a user-friendly format
= c(ENTROPY = metr$ent, HILL = 2^metr$ent, MI = metr$mutinf, NMI = metr$mutinf/metr$ent)
res round(res, 4)
The code below calculates racial segregation and diversity metrics for the metropolitan statistical area, principal city, and 3 suburban zones. The resultant metrics are returned as a table.
library(raceland)
library(terra)
#set working directory to the folder containing data using function setwd("")
##setwd("")
#list path to data files
= list.files("Atlanta_dataset/RL_grids", full.names = TRUE)
data #extract zone name (msa, city, fringe, outlying, inner) from file name
=unique(sapply(strsplit(basename(data), "[_. ]+"), function(x) x[2]))
zones
= data.frame()
metrics for (zone in zones) {
#read RL grids
= rast(file.path("Atlanta_dataset/RL_grids", paste("rl_", zone, ".tif", sep="")))
rl = rast(file.path("Atlanta_dataset/RL_grids", paste("rd_", zone, ".tif", sep="")))
rd
#calculate metrics
= calculate_metrics(rl, rd, fun = "mean", threshold = 1)
metr #return metrics in a user-friendly format
= c(ENTROPY = metr$ent, HILL = 2^metr$ent, MI = metr$mutinf, NMI = metr$mutinf/metr$ent)
res #create output table with metrics
= rbind(metrics, round(res, 3))
metrics
}
rownames(metrics) = zones
colnames(metrics) = c("E", "NH", "MI", "NMI")
The table lists segregation and diversity metrics for the MSA and zones within MSA.
library(kableExtra)
#print table
kable_classic(kbl(metrics), full_width = F)
E | NH | MI | NMI | |
---|---|---|---|---|
city | 1.694 | 3.235 | 0.258 | 0.152 |
fringe | 1.090 | 2.128 | 0.123 | 0.113 |
inner | 1.873 | 3.662 | 0.316 | 0.169 |
msa | 1.880 | 3.681 | 0.193 | 0.103 |
outlying | 1.888 | 3.702 | 0.134 | 0.071 |