I have been working on this but nothing seems to work. I have this dataset that is approximately 10k.
After cleaning the data. I want to count the products sold (There are more than 30 types that are repetitive) to see which one is sold the most and flagging the top 10. However, I would want to include the price of one unit next to the (n) column. For example, Apple was sold 1111 times I want $1 next to the count
Product_name | Sold | Price |
---|---|---|
Apple | 1 | 1.00 |
Orange | 1 | 2.00 |
Apple | 1 | 1.00 |
Orange | 1 | 2.00 |
Apple | 1 | 1.00 |
Orange | 1 | 2.00 |
Usning: df %>% count(Product_name) give this:
Product_Name | n |
---|---|
Apple | 1111 |
Orange | 2222 |
and I want to do this
Product_name | n | Price |
---|---|---|
Apple | 1111 | 1.00 |
Orange | 2222 | 2.00 |
In my data, I have something similar to this example and I have probably 30 different product_name I would really appreciate the help.
thanks,
CodePudding user response:
Perhaps this helps
library(dplyr)
df %>%
group_by(Product_name) %>%
summarise(n =n(), Price = sum(Price))
Or we may need the Price
also as grouping
df %>%
group_by(Product_name, Price) %>%
summarise(n =n(), .groups = 'drop')
CodePudding user response:
If the price does not vary for a particular product you could just use Price = first(Price)
within the summarise statement.