I have a challenge. I would want to group brands of products by amount purchased. The problem is that a lot of brands in my dataset have "double" names with padding's "Bio" and "Non Bio" like this:
Amount purchased Brand
2 Bakoma Bio
1 Bakoma Non Bio
2 Dorritos Bio
3 Dorritos Non Bio
I would want to create a function that will group brands by its name no matter if they have an adding Bio or not bio.
Example output:
Amount purchased Brand
3 Bakoma
5 Dorritos
Thank you for help :)
CodePudding user response:
A possible solution:
library(tidyverse)
df <- data.frame(
qtd = c(2L, 1L, 2L, 3L),
product = c("Bakoma Bio","Bakoma Non Bio",
"Dorritos Bio","Dorritos Non Bio")
)
df %>%
group_by(product = str_remove(product, "\\s (Bio|Non Bio)")) %>%
summarise(qtd = sum(qtd))
#> # A tibble: 2 × 2
#> product qtd
#> <chr> <int>
#> 1 Bakoma 3
#> 2 Dorritos 5
CodePudding user response:
A base R
option
aggregate(qtd ~ product, transform(df, product = trimws(product,
whitespace = "\\s (Bio|Non Bio).*")), sum)
product qtd
1 Bakoma 3
2 Dorritos 5
CodePudding user response:
Here is an alternative approach based on the pattern in your data is constant (e.g brand is first word): data from @PaulS (many thanks):
library(dplyr)
library(stringr)
df %>%
group_by(brand = word(product, 1)) %>%
summarise(Sum = sum(qtd))
brand Sum
<chr> <int>
1 Bakoma 3
2 Dorritos 5