Home > Blockchain >  R: How to replace values in one column using a replacement table
R: How to replace values in one column using a replacement table

Time:06-24

My dataframe looks something like df, and I would like to replace the values in X1 using a conversion table replacement_table. If values in X1 match the values in replacement_table$coordinates, then that value would be replaced with the corresponding value in replacement_table$location

replacement_table <- data.frame (location = c("locationA", "locationB", "locationC"),
                                 coordinates = c("42.8 -70.89", "42.85 -72.21", "31.28 -72.14"))


df <- data.frame(X1 = c("42.8 -70.89", "42.8 -70.89", "31.28 -72.14", "31.28 -72.14", "42.85 -72.21"),
                 X2 = c("23.2 -81.32", "42.8 -70.89", "31.28 -72.14", "23.2 -81.32", "42.8 -70.89"), 
                 X3 = c("19371", "0", "0", "19278", "28912"))

The output would look something like this:

df <- data.frame(X1 = c("locationA", "locationA", "locationC", "locationC", "locationB"),
                 X2 = c("23.2 -81.32", "42.8 -70.89", "31.28 -72.14", "23.2 -81.32", "42.8 -70.89"), 
                 X3 = c("19371", "0", "0", "19278", "28912"))

CodePudding user response:

You should join both dataframes to make it work. First you join using the left_join function after changing the name of the column X1 to coordinates:

library(tidyverse)

df = df %>%
  dplyr::rename('coordinates' = 'X1') %>%
  dplyr::left_join(replacement_table, by = 'coordinates')

The new df will now have data frame location and coordinates. Now we can apply a "case_when" (works as a if else) function to transform the coordinates in location if the location exists:

df = df %>%
  dplyr::mutate(coordinates = case_when(!is.na(location) ~ location))

To transform back the column name to X1 you rename again the column coordinates and remove the column location.

df = df %>%
  dplyr::rename('X1' = 'coordinates') %>%
  dplyr::select(-location)

CodePudding user response:

What about using stringr::str_replace_all()?

You need to use a replacement vector, rather than a table, but the difference seems to minor. Also, you need to wrap the replacement vector in stringr::str_fixed() to prevent issues with special characters such as " ".

df <- data.frame(X1 = c("42.8 -70.89", "42.8 -70.89", "31.28 -72.14", "31.28 -72.14", "42.85 -72.21"),
                 X2 = c("23.2 -81.32", "42.8 -70.89", "31.28 -72.14", "23.2 -81.32", "42.8 -70.89"), 
                 X3 = c("19371", "0", "0", "19278", "28912"))

replacement_vector <-  c("locationA", "locationB", "locationC")
names(replacement_vector) <- c("42.8 -70.89", "42.85 -72.21", "31.28 -72.14")

df$X1 <- stringr::str_replace_all(string = df$X1, stringr::fixed(replacement_vector))

df
#>          X1           X2    X3
#> 1 locationA  23.2 -81.32 19371
#> 2 locationA  42.8 -70.89     0
#> 3 locationC 31.28 -72.14     0
#> 4 locationC  23.2 -81.32 19278
#> 5 locationB  42.8 -70.89 28912

Created on 2022-06-23 by the reprex package (v2.0.1)

  • Related