I have two spatial dataframes: df_2016 and df_2020. I want to join them by a non spatial dataframe ID, which should be mostly consistent across the two. I used this code:
df_complete <- merge(x=as.data.frame(df_2016 ), y=as.data.frame(df_2020), by="ID", all=TRUE)
There are 1,372,613 observations in df_2016, and 1,423,781in df_2020. There are 1,440,175 observations in df_complete.
Of the originals, there are 16,720 observations in df_2016 that are not in df_2020, and 71,620 observations that are in df_2020 that are not in df_2016. I want to keep the geometry for df_2020 as long as it is there, and then fill in the geometry from df_2016 for the few that are missing. So I used this:
df_complete$geometry <- ifelse(is.na(df_complete$geometry.y), df_complete$geometry.x, df_complete$geometry.y)
Now I want to drop the df_complete$geometry.x and df_complete$geometry.y columns, but get this error:
df_complete= subset(df_complete, select = -c(df_complete$geometry.y, df_complete$geometry.x) )
Error in Ops.sfc(c(df_complete$geometry.y, df_complete$geometry.x)) :
argument "e2" is missing, with no default
Additionally, the class of df_complete is now just a dataframe, and I'd really like it to keep its spatial properties if possible. Any advice on how to resolve this would be greatly appreciated!
EDIT ANSWER FROM Skaqqs BELOW:
df_2016_flat <- st_drop_geometry(df_2016)
df_complete <- merge(
x = df_2020,
y = df_2016_flat,
by = "ID", all.x = TRUE)
# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)
df_complete2 <- rbind(
df_complete,
df_2016[df_2016$ID %in% df_2016_not_in_2020,])
rm(df_2016_flat, df_2016_not_in_2020)
CodePudding user response:
Because you haven't shared any sample data, I can't test this. But see below for my general approach as I understand your question:
# Merge by ID
# Only keep matches
# Keep geometry from df_2020
df_complete <- merge(
x = df_2020,
y = as.data.frame(df_2016),
by = "ID", all.x = TRUE)
# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)
df_complete2 <- rbind(
df_complete,
df_2016[df_2016$ID %in% df_2016_not_in_2020,])
I'd be happy to update my answer if you would like more specific advice and are able to share data!