Here's the code, where I am trying to create a variable by detecting the words and matching them. Here I use dplyr
package and its function mutate
in combination with case_when
. The problem is I am adding each one of the values manually as you see. How can I automate it by applying some loop functions to match the two?
city <- LETTERS #26 cities
district <- letters[10:20] #11 districts
streets <- paste0(district, district)
streets <- streets[-c(5:26)] #4 streets
df <- data.frame(x = c(1:5),
address = c("A, b, cc,", "B, dd", "a, dd", "C", "D, a, cc"))
library(dplyr)
library(stringi)
df2 <- df %>%
mutate(districts = case_when(
stri_detect_fixed(address, "b") ~ "b", #address[1]
#address[2]
stri_detect_fixed(address, "a") ~ "a", #address[3]
#address[4]
stri_detect_fixed(address, "cc") ~ "cc" #address[5]
))
The code scans through address
for the value in district
vector. I would love to do the same for city
and street
variables. So I used the modified version of the code from another question in Stack Overflow. It produces an error.
for (j in town_village2) {
trn_house3[,93] <- case_when(
stri_detect_fixed(trn_house3[1:6469, 4], j) ~ j)
}
I seek to produce this result:
x address city district street
1 A, b, cc, A b cc
2 B, dd B NA dd
3 a, dd NA a dd
4 C C NA NA
5 D, a, cc D a cc
CodePudding user response:
This will separate the elements into vectors:
library(tidyverse)
df <- data.frame(
x = c(1:5),
address = c("A, b, cc,", "B, dd", "a, dd", "C", "D, a, cc")
)
df3 <-
df %>%
separate_rows(address, sep = "[, ] ") %>%
filter(nchar(address) > 0) %>%
nest(address) %>%
transmute(x, districts = data %>% map(~ .x[[1]]))
#> Warning: All elements of `...` must be named.
#> Did you want `data = address`?
df3
#> # A tibble: 5 × 2
#> x districts
#> <int> <list>
#> 1 1 <chr [3]>
#> 2 2 <chr [2]>
#> 3 3 <chr [2]>
#> 4 4 <chr [1]>
#> 5 5 <chr [3]>
df3$districts[[1]]
#> [1] "A" "b" "cc"
Created on 2022-04-14 by the reprex package (v2.0.0)
CodePudding user response:
a data.table
approach
library(data.table)
DT <- data.table(city, streets, district)
# create a lookup table with all elements
lookup <- melt(DT, measure.vars = names(DT))
# set df to data.table format
setDT(df)
final <- df[, .(address = unlist(tstrsplit(address, ",[ ]*", perl = TRUE))), by = .(x)]
# now add elements
final[lookup, type := i.variable, on = .(address = value)]
# and dcast to wide
dcast(final, x ~ type, value.var = "address")
# x city streets district
# 1: 1 A cc b
# 2: 2 B dd <NA>
# 3: 3 <NA> dd a
# 4: 4 C <NA> <NA>
# 5: 5 D cc a