Home > Blockchain >  How to combine similar rows of data into one in R?
How to combine similar rows of data into one in R?

Time:08-04

I have a dataset as shown in the first table below. I would like to create a new dataframe that combines different versions of the same item (i.e. "googleV1" and "googleV2" are combined) and adds (quantity, revenue fields) or averages (ratings field) the numbers associated so that it reads as just a single line item. The second table below illustrates my desired output. How would I go about combining and adding/averaging values as such? Thank you in advance!

I would like the solution to apply for when there may be multiple columns that work like the "group" column. I would also like to refer to the columns "quantity", "revenue", and "rating" by column number.

item group quantity revenue rating
googleV1 blue 4525 $523513 94%
googleV2 blue 2452 $134134 82%
123V5 red 3563 $134134 82%
123V6 red 34534 $2345 34%
123V7 yellow 4574 $34535 64%
item group quantity revenue rating
google blue 6977 $657647 88%
123 red 38097 $136479 58%
123 yellow 4574 $34535 64%

CodePudding user response:

We may remove the substring in the first column and then do a group by summarise approach

library(dplyr)
library(stringr)
df1 %>% 
  group_by(item = str_remove(item, "V\\d $")) %>% 
  summarise(quantity = sum(quantity), 
   revenue = str_c("$", sum(as.numeric(str_remove(revenue, fixed("$"))))), 
    rating = str_c(mean(readr::parse_number(rating)), "%"), .groups = 'drop')

-output

# A tibble: 2 × 4
  item   quantity revenue rating
  <chr>     <int> <chr>   <chr> 
1 123       42671 $171014 60%   
2 google     6977 $657647 88%   

data

df1 <- structure(list(item = c("googleV1", "googleV2", "123V5", "123V6", 
"123V7"), quantity = c(4525L, 2452L, 3563L, 34534L, 4574L),
 revenue = c("$523513", 
"$134134", "$134134", "$2345", "$34535"), rating = c("94%", "82%", 
"82%", "34%", "64%")), class = "data.frame", row.names = c(NA, 
-5L))
  • Related