I am an absolute novice to R. What I would like to achieve is to have an identifier added to each dataframe row based on whether a string value in the same row contains that identifier.
Assume dataframe:
df <- data.frame(Code = c("DE8230", "18FR16", "2UK34", "45BE87C", "1894DE56", "AB12FR", "ES12456"),
Type = c("A", "B", "C", "C", "E", "A", "C"),
Value = c(12, 14, 8, 20, 21, 16, 5))
Code Type Value
1 DE8230 A 12
2 18FR16 B 14
3 2UK34 C 8
4 45BE87C C 20
5 1894DE56 E 21
6 AB12FR A 16
7 ES12456 C 5
I want to add a country column based on whether an identifier (e.g. DE, FR, UK, BE, ES) is present in the column 'Code' and than to list that country.
What I tried:
identifiers <- c("DE", "FR", "UK") #identifiers of choice
df <- mutate(df, country = 0)
for (i in 1:length(identifiers)){
df <- mutate(df,
country = ifelse(grepl(identifiers[i], Code), identifiers[i], country)
)
}
Which yields:
Code Type Value country
1 DE8230 A 12 DE
2 18FR16 B 14 FR
3 2UK34 C 8 UK
4 1894DE56 C 20 DE
5 AB12FR E 21 FR
Although this works, I think there must be a much more elegant solution omitting the for loop and just using same dplyr statement. However, I have not been able to figure it out.
N.b.: It is important that the mentioned identifiers are listed in a separate vector or list and not part of the mutate statement. This is just a hypothetical example, datasets and number of identifiers are much larger.
CodePudding user response:
We may use str_extract
by paste
ing the identifiers as a single string with |
separator and extract those substring from the 'Code'
library(dplyr)
library(stringr)
df %>%
mutate(country = str_extract(Code, str_c(identifiers, collapse = "|"))) %>%
drop_na(country)
-output
Code Type Value country
1 DE8230 A 12 DE
2 18FR16 B 14 FR
3 2UK34 C 8 UK
4 1894DE56 E 21 DE
5 AB12FR A 16 FR