Home > OS >  How to use aggregate with count but also consider some of the NA values in R?
How to use aggregate with count but also consider some of the NA values in R?

Time:08-24

I have a dataframe - df1. I want to get it to df2 as shown below using R:

**df1**
Cust_id Cust_name       Cust_order
1       Andrew          coffee
2       Dillain         burger
3       Alma            coffee
4       Wesney          chips
5       Kiko            chips
NA      NA              fries
NA      NA              milkshake
NA      NA              sandwich
NA      NA              eggs

**df2**
Cust_order  freq
coffee      2
burger      1
chips       2
fries       0
milkshake   0
sandwich    0
eggs        0

I have used the aggregate count function to achieve this but it does not give me the result that I want. I want the orders with the NA values to give "0". Any help is appreciated. I am very new to R and I have tried it in the following ways:

df2 <- aggregate(df1$Cust_order, by = list(df1$Cust_order), FUN = length)

CodePudding user response:

You can use the formula-notation for aggregate to group by Cust_order and calculate a statistic on Cust_id. In this case, you want to count the non-NA values of Cust_id, which you can do with function(x) sum(!is.na(x)). We have to explicitly tell it to keep the NA values using the na.action argument.

aggregate(Cust_id ~ Cust_order, df1, FUN = function(x) sum(!is.na(x)), na.action = na.pass)

which gives

  Cust_order Cust_id
1     burger       1
2      chips       2
3     coffee       2
4       eggs       0
5      fries       0
6  milkshake       0
7   sandwich       0

CodePudding user response:

library(data.table)
setDT(mydata)[, sum(!is.na(Cust_name)), by = .(Cust_order)]

CodePudding user response:

Another option is to sum on the Cust_id column. In this case we are also summing the non-NA records, but without the need to set na.action.

Wrap the aggregate function with setNames to set correct column names.

setNames(
  aggregate(df1$Cust_id, by = list(df1$Cust_order), FUN = \(x) sum(!is.na(x))), 
  c("Cust_order", "freq")
)

  Cust_order freq
1     burger    1
2      chips    2
3     coffee    2
4       eggs    0
5      fries    0
6  milkshake    0
7   sandwich    0
  • Related