double geometries - joining spatial dfs on non sf column and getting rid of extra geometry-CodePudding

I have two spatial dataframes: df_2016 and df_2020. I want to join them by a non spatial dataframe ID, which should be mostly consistent across the two. I used this code:

df_complete <- merge(x=as.data.frame(df_2016 ), y=as.data.frame(df_2020), by="ID", all=TRUE)

There are 1,372,613 observations in df_2016, and 1,423,781in df_2020. There are 1,440,175 observations in df_complete.

Of the originals, there are 16,720 observations in df_2016 that are not in df_2020, and 71,620 observations that are in df_2020 that are not in df_2016. I want to keep the geometry for df_2020 as long as it is there, and then fill in the geometry from df_2016 for the few that are missing. So I used this:

df_complete$geometry <- ifelse(is.na(df_complete$geometry.y), df_complete$geometry.x, df_complete$geometry.y)

Now I want to drop the df_complete$geometry.x and df_complete$geometry.y columns, but get this error:

df_complete= subset(df_complete, select = -c(df_complete$geometry.y, df_complete$geometry.x) )
Error in Ops.sfc(c(df_complete$geometry.y, df_complete$geometry.x)) : 
  argument "e2" is missing, with no default

Additionally, the class of df_complete is now just a dataframe, and I'd really like it to keep its spatial properties if possible. Any advice on how to resolve this would be greatly appreciated!

EDIT ANSWER FROM Skaqqs BELOW:

df_2016_flat <- st_drop_geometry(df_2016)

df_complete <- merge(
  x = df_2020,
  y = df_2016_flat,
  by = "ID", all.x = TRUE)

# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)

df_complete2 <- rbind(
 df_complete,
 df_2016[df_2016$ID %in% df_2016_not_in_2020,])

rm(df_2016_flat, df_2016_not_in_2020)

CodePudding user response：

Because you haven't shared any sample data, I can't test this. But see below for my general approach as I understand your question:

# Merge by ID
# Only keep matches
# Keep geometry from df_2020
df_complete <- merge(
  x = df_2020,
  y = as.data.frame(df_2016),
  by = "ID", all.x = TRUE)

# Get IDs from 2016 that aren't in 2020
df_2016_not_in_2020 <- setdiff(df_2016$ID, df_2020$ID)

df_complete2 <- rbind(
 df_complete,
 df_2016[df_2016$ID %in% df_2016_not_in_2020,])

I'd be happy to update my answer if you would like more specific advice and are able to share data!