Home > Net >  Count sum of occurence of a string in a column of a dataframe grouped by date
Count sum of occurence of a string in a column of a dataframe grouped by date

Time:02-02

I have a df consisting of two columns:

df <- data.frame(Date = c("01-01-2016","02-01-2022","05-01-2022", "21-12-2022","03-09-2021", "21-12-2017"),
                 Value = c(14.2, 23.2, "bc", "bc", 78.2, "bc" ))

I want to count the sum of occurences of the word "bc" in the grouped by date, so tried the following:

df2 <- df %>% group_by(Date) %>% summarise(length(grep("bc", Value)))

but this gives me the total number of occurence of "bc" in the entire df which is 3

WHat I want is

**Expected output **

Date bc_total
2022 2
2017 1

CodePudding user response:

library(dplyr) #1.1.0
library(lubridate)
df %>% 
  mutate(Date = year(dmy(Date))) %>% 
  summarise(bc_total = sum(Value == "bc"), .by = Date) %>% 
  filter(bc_total != 0)

#  Date bc_total
#1 2022        2
#2 2017        1

Or

df %>% 
  mutate(Date = year(dmy(Date))) %>% 
  filter(Value == "bc") %>% 
  count(Date)

CodePudding user response:

You can use rowSums and ifelse to count the number of "bc" in each row, then summarize by grouping by year:

library(dplyr)
df$bc_count <- ifelse(df$Value == "bc", 1, 0)
df2 <- df %>% group_by(Year = format(as.Date(Date, "%d-%m-%Y"), "%Y")) %>% 
  summarize(bc_total = sum(bc_count))

Note: Make sure to convert the Date column to date format using as.Date with the correct format before grouping by year.

CodePudding user response:

Code

library(dplyr)
library(lubridate)

df <- data.frame(Date = c("01-01-2016","02-01-2022","05-01-2022", "21-12-2022","03-09-2021", "21-12-2017"),
                 Value = c(14.2, 23.2, "bc", "bc", 78.2, "bc" ))
df %>% 
  mutate(Year = year(dmy(Date))) %>% 
  group_by(Year,Value) %>%
  summarise(Count=n()) %>% 
  as.data.frame()

Output

 Year Value Count
1 2016  14.2     1
2 2017    bc     1
3 2021  78.2     1
4 2022  23.2     1
5 2022    bc     2

hope this helps :)

  • Related