Home > OS >  How to replace NAs by left_join, leave non NAs untouched
How to replace NAs by left_join, leave non NAs untouched

Time:10-11

I have a dataframe of locations with latitude and longitude in columns.

df_places <-
  tibble(
    place_country = c("Melbourne, Australia", "Sydney, Australia", "Brisbane, Australia"),
    state = c("VIC", "NSW", "QLD"),
    temperature = c("Cold", "Cool", "Hot"),
    lon = c(41.23, NA, 43.55),
    lat = c(NA, -88.10, -80.48))

Some locations are missing their lat or lon, so I need to look them up with tmaptools::geocode_OSM().

df_geocodes <-
  df_places %>% 
  filter(is.na(lat) | is.na(lon)) %>% 
  select(place_country) %>% 
  bind_cols( 
    tmaptools::geocode_OSM(
      q = .$place_country, 
      as.data.frame = T)[,c("lon", "lat")]) 

Which yields

# A tibble: 2 × 3
  place_country          lon   lat
  <chr>                <dbl> <dbl>
1 Melbourne, Australia  145. -37.8
2 Sydney, Australia     151. -33.9

This takes a while to run, so I want to only run this for places that lack their lat/lon and then join the lat/lon value back into the original dataframe, replacing the NA with that location's missing lat/lon.

CodePudding user response:

I would do it this way:

ll <- c('lat', 'lon')
nas <- !complete.cases(df_places[, ll])
df_places[nas, ll] <- tmaptools::geocode_OSM(df_places$place_country[nas], 
                                             as.data.frame=T)[, ll]

CodePudding user response:

@moodymudskipper posted an excellent answer in the comments, use dplyr::rows_update()

Solution

# Select any locations missing a lat or lon coordinate
df_geocodes <-
  df_locations %>% 
  filter(is.na(lon) | is.na(lat)) %>%
  select(-starts_with(c("lat", "lon"))) 

# If there are any missing, then geocode them
if (nrow(df_geocodes) > 0){
  df_geocodes <-
    df_geocodes %>% 
    bind_cols( 
      tmaptools::geocode_OSM(
        q = .$place_country, 
        as.data.frame = T)[,c("lon", "lat")])}

# Join the two
df_locations <- 
  df_locations %>% 
  dplyr::rows_update(
    x = ., 
    y = df_geocodes, 
    by = "place_country")
  • Related