I have an empty column designated for categorising entries in my data frame. Categories are not exclusive, i.e. one entry can have multiple categories.
animals categories
1 monkey
2 humpback whale
3 river trout
4 seagull
The categories
column should have categories based on the animal's properties. I know the properties based on vectors. The elements in the vectors aren't necessarily a perfect match.
mammals <- c("whale", "monkey", "dog")
swimming <- c("whale", "trout", "dolphin")
How do I get the following result, ideally without looping?
animals categories
1 monkey mammal
2 humpback whale mammal,swimming
3 river trout swimming
4 seagull
CodePudding user response:
This may be done with fuzzyjoin
after creating a key/val dataset - lst
from dplyr
returns a named list
, which is converted to a two column dataset with enframe
, unnest
the list
column, grouped by 'animals', paste
the 'categories' to a single string and then do a join (regex_left_join
) with the original dataset
library(fuzzyjoin)
library(dplyr)
library(tidyr)
library(tibble)
keydat <- lst(mammals, swimming) %>%
enframe(name = 'categories', value = 'animals') %>%
unnest(animals) %>%
group_by(animals) %>%
summarise(categories = toString(categories))
regex_left_join(df1, keydat, by= 'animals', ignore_case = TRUE) %>%
transmute(animals = animals.x, categories)
# A tibble: 4 × 2
animals categories
<chr> <chr>
1 monkey mammals
2 humpback whale mammals, swimming
3 river trout swimming
4 seagull <NA>
data
df1 <- tibble(animals = c('monkey', 'humpback whale', 'river trout', 'seagull'))
CodePudding user response:
A base R option using stack
aggregate
grepl
lut <- aggregate(
. ~ values,
type.convert(
stack(list(mammals = mammals, swimming = swimming)),
as.is = TRUE
),
toString
)
p <- sapply(
lut$values,
grepl,
x = df$animals
)
df$categories <- lut$ind[replace(rowSums(p * col(p)), rowSums(p) == 0, NA)]
which gives
> df
animals categories
1 monkey mammals
2 humpback whale mammals, swimming
3 river trout swimming
4 seagull <NA>
Data
df <- data.frame(animals = c("monkey", "humpback whale", "river trout", "seagull"))