I have following dataframes in R, How can I substitute the test_data$origin_country and test_data$destin_country with the numerical values at country_codes$ID?
test_data <- data.frame(
origin_country = c('US', 'US', 'DE', 'CN'),
destin_country = c('DE', 'DE', 'UK', 'IT'),
year = c(2020, 2020, 2019, 2019),
item = c('wheat', 'wheat', 'wheat', 'rice'),
value = c(2000, 2000, 3000, 2500))
country_codes <- data.frame(
countries = c('CN', 'DE', 'IT', 'UK', 'US'),
ID = c(1, 2, 3, 4, 5))
I have seen very similar questions, but none tackling this case. My desired result is:
output <- data.frame(
origin_country = c('5', '5', '2', '1'),
destin_country = c('2', '2', '4', '3'),
year = c(2020, 2020, 2019, 2019),
item = c('wheat', 'wheat', 'wheat', 'rice'),
value = c(2000, 2000, 3000, 2500))
Truly appreciate your insights!
CodePudding user response:
It is simple
library(dplyr)
test_data %>%
mutate(origin_country = country_codes$ID[match(origin_country, country_codes$countries)],
destin_country = country_codes$ID[match(destin_country, country_codes$countries)])
#> origin_country destin_country year item value
#> 1 5 2 2020 wheat 2000
#> 2 5 2 2020 wheat 2000
#> 3 2 4 2019 wheat 3000
#> 4 1 3 2019 rice 2500
Created on 2022-08-16 by the reprex package (v2.0.1)
CodePudding user response:
Maybe even simpler if you take country_codes as named character vector, that you may pass to str_replace like so
library(tidyverse)
test_data <- data.frame(
origin_country = c('US', 'US', 'DE', 'CN'),
destin_country = c('DE', 'DE', 'UK', 'IT'),
year = c(2020, 2020, 2019, 2019),
item = c('wheat', 'wheat', 'wheat', 'rice'),
value = c(2000, 2000, 3000, 2500))
country_codes <- data.frame(
countries = c('CN', 'DE', 'IT', 'UK', 'US'),
ID = c(1, 2, 3, 4, 5))
# convert to named character vector
country_codes <- country_codes %>%
mutate_at('ID', as.character) %>%
deframe()
test_data %>%
mutate_at(c('origin_country', 'destin_country'), ~ str_replace_all(.x, country_codes))
#> origin_country destin_country year item value
#> 1 5 2 2020 wheat 2000
#> 2 5 2 2020 wheat 2000
#> 3 2 4 2019 wheat 3000
#> 4 1 3 2019 rice 2500
Created on 2022-08-16 by the reprex package (v2.0.1)
CodePudding user response:
As the ID is 1:5
the output of match
could directly be used to give the numbers.
test_data[1:2] <- lapply(test_data[1:2], match, country_codes[,1])
test_data
# origin_country destin_country year item value
#1 5 2 2020 wheat 2000
#2 5 2 2020 wheat 2000
#3 2 4 2019 wheat 3000
#4 1 3 2019 rice 2500
In case the numbers are not like in the given example and needed to be taken form column ID you can use.
test_data[1:2] <- country_codes$ID[sapply(test_data[1:2], match, country_codes[,1])]
Or a very simple way using a named vector.
s <- setNames(country_codes$ID, country_codes$countries)
test_data$origin_country <- s[test_data$origin_country]
test_data$destin_country <- s[test_data$destin_country]