Home > Software design >  Count number of occurences based on corresponding collumn
Count number of occurences based on corresponding collumn

Time:02-22

Category Value
First 100
Second 150
First N/A
Second 200
First 75
Second N/A

I would like to count the number of times each value occurs when it has a value. To elaborate, above, both the 'First' and 'Second' column would be equal to 2. I have tried sum(df$Category == 'First', na.rm=TRUE) but am very lost. The csv file I have output is a dataframe I have made from a larger data set. I would like to calculate both First and Second or just one of them, they will be equal because it is 3 different categories that are weeks where the 'N/A' values are weeks where data has not been logged yet.

CodePudding user response:

Assuming "df" is your dataframe having columns as "category" and "value"

df %>% 
  filter(value !='N/A', !is.na(value)) %>%  #filter your data
  count(category)

To give you the following data.

  category n
1    First 2
2   Second 2

CodePudding user response:

Here are a couple of base R options (as there are already good tidyverse solutions in the comments):

table(df$Category[!is.na(as.numeric(df$Value))])

 #First Second 
 #    2      2 

Or we could use aggregate:

aggregate(as.numeric(Value) ~ Category, df, NROW, na.action = na.omit)

#  Category as.numeric(Value)
#1    First                 2
#2   Second                 2

Tidyverse

First, we can convert the N/A values to NA, then use count on the non-NA values.

library(tidyverse)

df %>% 
  mutate(across(everything(), na_if, "N/A")) %>% 
  count(Category, wt = !is.na(Value))

#  Category n
#1    First 2
#2   Second 2

Note: If you are bringing in your data with read.table or read.csv, then there is na.strings option (i.e., na.strings='N/A'), which would change the N/A to NA.

Data.table

library(data.table)

dt <- as.data.table(df)
dt[, sum(Value != "N/A"), by = Category]

Data

df <- structure(list(Category = c("First", "Second", "First", "Second", 
"First", "Second"), Value = c("100", "150", "N/A", "200", "75", 
"N/A")), class = "data.frame", row.names = c(NA, -6L))
  • Related