I have a dataframe - df1. I want to get it to df2 as shown below using R:
**df1**
Cust_id Cust_name Cust_order
1 Andrew coffee
2 Dillain burger
3 Alma coffee
4 Wesney chips
5 Kiko chips
NA NA fries
NA NA milkshake
NA NA sandwich
NA NA eggs
**df2**
Cust_order freq
coffee 2
burger 1
chips 2
fries 0
milkshake 0
sandwich 0
eggs 0
I have used the aggregate count function to achieve this but it does not give me the result that I want. I want the orders with the NA values to give "0". Any help is appreciated. I am very new to R and I have tried it in the following ways:
df2 <- aggregate(df1$Cust_order, by = list(df1$Cust_order), FUN = length)
CodePudding user response:
You can use the formula
-notation for aggregate
to group by Cust_order
and calculate a statistic on Cust_id
. In this case, you want to count the non-NA
values of Cust_id
, which you can do with function(x) sum(!is.na(x))
. We have to explicitly tell it to keep the NA
values using the na.action
argument.
aggregate(Cust_id ~ Cust_order, df1, FUN = function(x) sum(!is.na(x)), na.action = na.pass)
which gives
Cust_order Cust_id
1 burger 1
2 chips 2
3 coffee 2
4 eggs 0
5 fries 0
6 milkshake 0
7 sandwich 0
CodePudding user response:
library(data.table)
setDT(mydata)[, sum(!is.na(Cust_name)), by = .(Cust_order)]
CodePudding user response:
Another option is to sum
on the Cust_id
column. In this case we are also summing the non-NA records, but without the need to set na.action
.
Wrap the aggregate
function with setNames
to set correct column names.
setNames(
aggregate(df1$Cust_id, by = list(df1$Cust_order), FUN = \(x) sum(!is.na(x))),
c("Cust_order", "freq")
)
Cust_order freq
1 burger 1
2 chips 2
3 coffee 2
4 eggs 0
5 fries 0
6 milkshake 0
7 sandwich 0