I am trying to learn (or more specifically, remember and learn to implement) spatial data analysis in R. I am having some elementary difficulties with manipulating geographic data, which get in the way of any learning of methods.
To cut to the chase, the result when I try to produce a heat map of wealth I get something distinctly weird: no aesthetic connection between wealth and fill, and many countries missing. I'd expect some to be missing due to a partially complete dataset, but not as many as this. Secondly, all the countries that have a fill matching a value seem to be of the same value–see below.
How did I get here? I have a dataset df
with country data on wealth (from Credit Suisse) and some data on gdp growth, which I have dput()
below:
The I downloaded and unzipped shapefiles for World administrative data:
download.file(
url = "https://public.opendatasoft.com/api/explore/v2.1/catalog/datasets/world-administrative-boundaries/exports/shp?lang=en&timezone=Europe/London",
destfile = "world_administrative_boundaries.zip")
unzip(
zipfile = "world_administrative_boundaries.zip",
exdir = "world_administrative_boundaries"
)
This worked fine. As an aside I would appreciate recommendations of sources of spatial data at country and (for the United Kingdom) regional and lower levels, as well as recommendations for R packages for interfacing the data. But anyway...
First I loaded some packages:
library(tidyverse)
library(sf)
Then I read the shape files into R, and merged the dataset with my wealth data.
world_adm0_sf <-st_read("world_administrative_boundaries/world-administrative-boundaries.shp")
world_adm0_sf <- inner_join(world_adm0_sf, df, by = "iso3")
That should give me a partial dataset – but not nearly as partial as what seems to be plotted. And then I use ggplot2:
ggplot(data = world_adm0_sf, aes(fill=wealth))
geom_sf()
scale_fill_steps()
theme_light()
ggsave(filename = "Wealth.png",
device = "png",
width = 10,
height = 4)
That produces the plot above.
This is all preliminary. My real goal is to create an adjacency matrix, so I can run some spatial models on the data. I think I can do that as follows:
library(spdep)
w_world_adm0 <- poly2nb(world_adm0_sf) %>% nb2mat(style = "W")
But this gives the following error: Error in poly2nb(world_adm0_sf) : Empty geometries found
. I can only imagine that this is related to the problems with the plotting.
Any help greatly appreciated.
structure(list(iso3 = c("USA", "CHN", "JPN", "DEU", "GBR", "FRA",
"IND", "CAN", "ITA", "AUS", "ESP", "NLD", "CHE", "MEX", "BEL",
"IDN", "BRA", "SWE", "SAU", "DNK", "AUT", "SGP", "NZL", "ISR",
"POL", "NOR", "THA", "PRT", "BGD", "ARE", "PHL", "VNM", "ZAF",
"IRL", "GRC", "FIN", "PAK", "CHL", "UKR", "ROU", "COL", "MYS",
"HUN", "KWT", "KAZ", "PER", "QAT", "LKA", "MAR", "KEN", "LBN",
"LUX", "ARG", "DZA", "BGR", "ECU", "ETH", "HRV", "BLR", "JOR",
"SVN", "SRB", "TUN", "CRI", "SLV", "OMN", "URY", "PAN", "LTU",
"BHR", "LVA", "ISL", "AZE", "CYP", "BOL", "BIH", "EST", "ALB",
"NPL", "KHM", "MUS", "COD", "MLT", "NIC", "TTO", "ARM", "GEO",
"CMR", "JAM", "LBY", "SEN", "MNE", "NAM", "MDG", "BWA", "ZMB",
"TJK", "MLI", "GAB", "MWI", "GNQ", "MOZ", "BRB", "MNG", "LBR",
"FJI", "TCD", "GUY", "ERI", "COG", "SYC", "SLE", "BLZ", "COM",
"DJI", "CAF", "LSO", "GNB"), wealth = c(126300.403036252, 73939.8891311991,
27260.570352125, 17389.799264483, 15675.9889849248, 16282.4392222225,
12700.3877756376, 10585.7395730159, 12176.0329952741, 9268.27874790906,
8481.28808374314, 5436.51408079177, 4672.6030405074, 3774.9451374071,
3543.77178519827, 3086.57900830581, 3034.93997199139, 2741.53111378823,
1905.26613759989, 1782.18132955766, 1855.14333493139, 1636.83125031141,
1288.20550646464, 1307.54537310394, 1464.60908917529, 1345.59016547028,
1392.90190541805, 1337.19890038495, 898.958801638123, 926.368205722374,
933.191002686083, 874.858045943263, 903.040059653359, 882.20086484466,
962.807508534614, 836.536239906252, 663.532991342168, 737.397506705501,
517.376031972012, 584.466282600888, 586.072886162324, 565.720706897012,
450.495646279881, 413.524330775955, 466.674062703931, 427.957436973251,
347.37129513471, 368.206981773813, 338.261747852825, 328.180531919524,
307.493974123627, 305.899138540934, 260.992240660559, 254.245993196906,
215.860597024656, 207.22935523629, 221.394916106677, 205.419322153595,
164.189635248546, 184.189116509732, 180.001925261347, 161.244568189131,
160.306734507052, 165.337454268781, 145.171877463561, 144.726277337691,
150.407054934631, 129.783182270863, 131.132956518345, 114.623545266656,
114.486922170596, 98.9661166555199, 98.3003532984958, 101.44441788665,
87.1461534054738, 74.9746836821397, 77.795657106148, 66.1431475890217,
71.2009574127775, 55.4559618547982, 59.5703711670758, 48.0465708689348,
57.4375916937781, 51.4806937778165, 46.9537143621449, 44.8053089422735,
43.2099298782149, 40.5284167581835, 41.1425896096207, 70.5563794908187,
39.5376722680099, 31.4416134859147, 25.4319082241779, 28.7033124617245,
23.5162145038624, 21.8465824967143, 21.5778619134227, 20.8494437096404,
17.2032811219951, 18.5250206627168, 14.6881425292255, 13.2498106184224,
15.5574142939735, 12.7783060223608, 11.2938871435042, 9.82612644104218,
8.61312574821742, 6.08951482556896, 5.04939300180385, 4.01486292583383,
3.2236344186979, 2.88763002016324, 2.54563458904526, 2.47536481004851,
1.96914002009599, 1.93613437718668, 1.90500798026385, 1.64313029137334
), gdp_growth = c(5.94548476122172, 8.1097925807738, 1.65707111732294,
2.62698726672524, 7.52491037367686, 6.81658913649873, 8.68122873083119,
4.54088730942124, 6.73727712298924, 2.23621243944461, 5.5195949732682,
4.86321914933848, 4.22171657654073, 4.72056529768081, 6.13386596356445,
3.69124011191289, 4.61942162067312, 5.07508472544021, 3.24092976848051,
4.85570758570233, 4.55685091308877, 7.61396261502647, 3.71401011995567,
8.61199034846989, 6.84771336241919, 3.88093603636767, 1.53383558441951,
5.482391708458, 6.93867508910159, 3.91629608037678, 5.70315596840621,
2.56155114232494, 4.9130966818804, 13.5882471079549, 8.43442550167381,
2.97322935640895, 6.48708677384843, 11.6682247016219, 3.40000000000001,
5.10012819668945, 10.6770130738261, 3.09215964982188, 7.11731531479809,
NA, 4.29999999999981, 13.3495090777977, 1.59075695426887, 3.32676381781447,
7.9296679419552, 7.51735538703886, -7.0012393603983, 5.10245544656804,
10.3982494646904, 3.50000000049758, 7.63460971569079, 4.23525124172572,
5.63730300819225, 13.0722049215979, 2.29987941172813, 2.2281384129623,
8.2110625918128, 7.54991253537386, 4.32277635659793, 7.57956529907922,
10.2784761168053, 3.09227668737446, 4.36700931644991, 15.3358739290709,
5.98103211278502, 2.22535124229128, 4.06807386332741, 4.40304478596893,
5.60003727280338, 6.63892696813528, 6.1058042252093, 7.54559112979621,
8.01346292400842, 8.51550112324915, 4.24694014145481, 3.02638936277746,
3.70361112396466, 6.20015401075946, 10.2977635999018, 10.3443407049189,
-1.02654016223218, 5.69999999999997, 10.4655372923886, 3.64991688648905,
4.60000000000001, 31.3724222267248, 6.06449596752137, 12.4343589851741,
2.65605081999696, 4.40234000007614, 11.3700218626674, 4.59873403878439,
9.2001904308498, 3.05261356942125, 1.462286510102, 2.75141640762955,
-0.945382916517076, 2.36398841807251, -0.189272866293138, 1.63676957584143,
4.98671283481617, -5.11160739321285, -1.1999905246095, 20.0600109474921,
NA, -2.20000260638274, 7.8618824199665, 4.10470008884118, 15.2261306532663,
2.11219557695527, 4.81247738295502, 0.900000006878756, 1.34762315508203,
3.80000242656607)), row.names = c(NA, -118L), class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
Part of the problem is that the shapefile of administrative boundaries you are using is incomplete. You can see this if you plot it on its own before the inner join:
library(tidyverse)
library(sf)
world_adm0_sf <- st_read(paste0("world_administrative_boundaries/",
"world-administrative-boundaries.shp"))
ggplot(data = world_adm0_sf)
geom_sf()
theme_light()
Instead, we can use the handy rnaturalearth package to get a shapefile of all the world's countries including their ISO codes:
library(rnaturalearth)
ne_sf <- ne_countries(returnclass = "sf")
ggplot(data = ne_sf)
geom_sf()
theme_light()
We can do our join and plot the data over the top of this map so that countries for which no data is available are still visible. We should pick a color scale which is widely divergent to increase the contrast between the wealth of different countries:
world_adm0_sf <- inner_join(ne_sf, df, by = c(iso_a3 = "iso3"))
ggplot(data = world_adm0_sf)
geom_sf(data = ne_sf)
geom_sf(aes(fill = wealth))
scale_fill_distiller(palette = "Spectral")
theme_light()