HAVE1 <- data.frame(NAME = c("Amelia", "Amelia", "Amelia", "Amelia", "Ava", "Ava", "Ava", "Charlotte", "Charlotte", "Charlotte", "Charlotte", "Emma", "Emma", "Olivia", "Olivia"), CLASS1 = c("Bee", "Chicken", "Shrimp", "Turkey", "Horse", "Pig", "Sheep", "Deer", "Dove", "Ducks", "Fish", "Crab", "Rabbit", "Cow", "Goat"))
HAVE2 <- data.frame(
NAME = c("Amelia", "Amelia", "Amelia", "Ava", "Ava", "Ava", "Charlotte", "Charlotte", "Charlotte", "Charlotte", "Emma", "Emma", "Olivia", "Olivia"),
CLASS2 = c("Chicken", "Some Shrimp", "TurkeyPU", "Horse-Horsey", "Pig-Farn", "Count-Sheep", "Deer Head", "Dove Count", "Ducks Count", "Fishes", "Crab- Count", "Rabbit-Count", "Cow-Type", "Goat-Feed")
)
WANT <- data.frame(
NAME = c("Amelia", "Amelia", "Amelia", "Ava", "Ava", "Ava", "Charlotte", "Charlotte", "Charlotte", "Charlotte", "Emma", "Emma", "Olivia", "Olivia"),
CLASS1 = c("Chicken", "Shrimp", "Turkey", "Horse", "Pig", "Sheep", "Deer", "Dove", "Ducks", "Fish", "Crab", "Rabbit", "Cow", "Goat"),
CLASS2 = c("Chicken", "Some Shrimp", "TurkeyPU", "Horse-Horsey", "Pig-Farn", "Count-Sheep", "Deer Head", "Dove Count", "Ducks Count", "Fishes", "Crab- Count", "Rabbit-Count", "Cow-Type", "Goat-Feed")
)
I have 'HAVE1' and 'HAVE2' and want to make 'WANT' which merge 'HAVE1' and 'HAVE2' using EXACT MATCH on 'NAME' and subMATCH on 'CLASS' as you can see the word in CLASS1 is also in CLASS2 but CLASS2 has more characters and some are the same
CodePudding user response:
- We can use
library(tidyverse)
HAVE1 |> full_join(HAVE2) |>
filter(str_detect(CLASS2 , CLASS1))
- Output
Joining, by = "NAME"
NAME CLASS1 CLASS2
1 Amelia Chicken Chicken
2 Amelia Shrimp Some Shrimp
3 Amelia Turkey TurkeyPU
4 Ava Horse Horse-Horsey
5 Ava Pig Pig-Farn
6 Ava Sheep Count-Sheep
7 Charlotte Deer Deer Head
8 Charlotte Dove Dove Count
9 Charlotte Ducks Ducks Count
10 Charlotte Fish Fishes
11 Emma Crab Crab- Count
12 Emma Rabbit Rabbit-Count
13 Olivia Cow Cow-Type
14 Olivia Goat Goat-Feed
CodePudding user response:
Here is a base R option using merge
subset
grepl
subset(
merge(HAVE1, HAVE2, all = TRUE, sort = FALSE),
Vectorize(grepl)(CLASS1, CLASS2)
)
which gives
NAME CLASS1 CLASS2
4 Amelia Chicken Chicken
8 Amelia Shrimp Some Shrimp
12 Amelia Turkey TurkeyPU
13 Ava Horse Horse-Horsey
17 Ava Pig Pig-Farn
21 Ava Sheep Count-Sheep
22 Charlotte Deer Deer Head
27 Charlotte Dove Dove Count
32 Charlotte Ducks Ducks Count
37 Charlotte Fish Fishes
38 Emma Crab Crab- Count
41 Emma Rabbit Rabbit-Count
42 Olivia Cow Cow-Type
45 Olivia Goat Goat-Feed
breakdown the code
- First, we generate a full-join table of
HAVE1
andHAVE2
> merge(HAVE1, HAVE2, all = TRUE, sort = FALSE)
NAME CLASS1 CLASS2
1 Amelia Bee Chicken
2 Amelia Bee Some Shrimp
3 Amelia Bee TurkeyPU
4 Amelia Chicken Chicken
5 Amelia Chicken Some Shrimp
6 Amelia Chicken TurkeyPU
7 Amelia Shrimp Chicken
8 Amelia Shrimp Some Shrimp
9 Amelia Shrimp TurkeyPU
10 Amelia Turkey Chicken
11 Amelia Turkey Some Shrimp
12 Amelia Turkey TurkeyPU
13 Ava Horse Horse-Horsey
14 Ava Horse Pig-Farn
15 Ava Horse Count-Sheep
16 Ava Pig Horse-Horsey
17 Ava Pig Pig-Farn
18 Ava Pig Count-Sheep
19 Ava Sheep Horse-Horsey
20 Ava Sheep Pig-Farn
21 Ava Sheep Count-Sheep
22 Charlotte Deer Deer Head
23 Charlotte Deer Dove Count
24 Charlotte Deer Ducks Count
25 Charlotte Deer Fishes
26 Charlotte Dove Deer Head
27 Charlotte Dove Dove Count
28 Charlotte Dove Ducks Count
29 Charlotte Dove Fishes
30 Charlotte Ducks Deer Head
31 Charlotte Ducks Dove Count
32 Charlotte Ducks Ducks Count
33 Charlotte Ducks Fishes
34 Charlotte Fish Deer Head
35 Charlotte Fish Dove Count
36 Charlotte Fish Ducks Count
37 Charlotte Fish Fishes
38 Emma Crab Crab- Count
39 Emma Crab Rabbit-Count
40 Emma Rabbit Crab- Count
41 Emma Rabbit Rabbit-Count
42 Olivia Cow Cow-Type
43 Olivia Cow Goat-Feed
44 Olivia Goat Cow-Type
45 Olivia Goat Goat-Feed
- Then, we use
grepl
in a rowwise manner (implemented byVectorized
) to check if patterns inCLASS1
can be found inCLASS2
in the above merged table.
CodePudding user response:
We could use fuzzyjoin
library(fuzzyjoin)
regex_inner_join(HAVE2, HAVE1, by = c("NAME", "CLASS2" = "CLASS1")) %>%
select(NAME = NAME.x, CLASS1, CLASS2)
NAME CLASS1 CLASS2
1 Amelia Chicken Chicken
2 Amelia Shrimp Some Shrimp
3 Amelia Turkey TurkeyPU
4 Ava Horse Horse-Horsey
5 Ava Pig Pig-Farn
6 Ava Sheep Count-Sheep
7 Charlotte Deer Deer Head
8 Charlotte Dove Dove Count
9 Charlotte Ducks Ducks Count
10 Charlotte Fish Fishes
11 Emma Crab Crab- Count
12 Emma Rabbit Rabbit-Count
13 Olivia Cow Cow-Type
14 Olivia Goat Goat-Feed