I have a dataframe 'df' that has categorical and POSIXct columns. The data look like:
Category | DateTime |
---|---|
A | 2022-08-29 00:00:00 |
A | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
A 1 | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
B 1 | 2022-08-29 00:00:00 |
A | 2022-08-29 02:00:00 |
A 1 | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
B 1 | 2022-08-29 02:00:00 |
I would like to create a new dataframe 'df2' that counts each unique value in column 'Category' that doesn't end with " 1" by column 'DateTime', so that the results look like:
Category | DateTime | CatCount |
---|---|---|
A | 2022-08-29 00:00:00 | 2 |
B | 2022-08-29 00:00:00 | 3 |
A | 2022-08-29 02:00:00 | 1 |
B | 2022-08-29 02:00:00 | 3 |
CodePudding user response:
We could filter
output the rows having 1, then do a count
library(dplyr)
library(stringr)
df1 %>%
filter(str_detect(Category, "\\s 1", negate = TRUE)) %>%
count(Category, DateTime, name = "CatCount")
-output
Category DateTime CatCount
1 A 2022-08-29 00:00:00 2
2 A 2022-08-29 02:00:00 1
3 B 2022-08-29 00:00:00 3
4 B 2022-08-29 02:00:00 3
data
df1 <- structure(list(Category = c("A", "A", "A 1", "A 1", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1", "B 1", "B 1", "A", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1"), DateTime = c("2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00")),
class = "data.frame", row.names = c(NA,
-21L))
CodePudding user response:
library(dplyr)
your_data %>%
filter(!endsWith(Category, "1")) %>%
count(Category, DateTime)
CodePudding user response:
Another base R solution with transform
aggregate
:
transform(df1[!grepl("1$", df1$Category),], count = 1) |>
aggregate(count ~ Category DateTime, data = _, length)
Category DateTime count
1 A 2022-08-29 00:00:00 2
2 B 2022-08-29 00:00:00 3
3 A 2022-08-29 02:00:00 1
4 B 2022-08-29 02:00:00 3
CodePudding user response:
Or a base
solution using table
:
df[nchar(df$Category) == 1,] |>
table() |>
as.data.frame(responseName = "CatCount")
We could of course subset in various ways df[!endsWith(df$Category, "1"),]
(as @ Gregor Thomas) or df[!grepl("\\s 1", df$Category),]
(as @ akrun).
Output:
Category DateTime CatCount
1 A 2022-08-29 00:00:00 2
2 B 2022-08-29 00:00:00 3
3 A 2022-08-29 02:00:00 1
4 B 2022-08-29 02:00:00 3
Data:
library(readr)
df <- read_delim("Category,DateTime
A,2022-08-29 00:00:00
A,2022-08-29 00:00:00
A 1,2022-08-29 00:00:00
A 1,2022-08-29 00:00:00
A 1,2022-08-29 00:00:00
B,2022-08-29 00:00:00
B,2022-08-29 00:00:00
B,2022-08-29 00:00:00
B 1,2022-08-29 00:00:00
B 1,2022-08-29 00:00:00
B 1,2022-08-29 00:00:00
B 1,2022-08-29 00:00:00
B 1,2022-08-29 00:00:00
A,2022-08-29 02:00:00
A 1,2022-08-29 02:00:00
B,2022-08-29 02:00:00
B,2022-08-29 02:00:00
B,2022-08-29 02:00:00
B 1,2022-08-29 02:00:00
B 1,2022-08-29 02:00:00
B 1,2022-08-29 02:00:00", delim = ",")
Update: Data added.
CodePudding user response:
Here is a data.table
option, where we can use grepl
(or could also use stringr
) to ignore any rows that contain a number in Category
, then count using .N
.
library(data.table)
setDT(dt)[!grepl("\\d", Category), .N, .(Category, DateTime)]
Output
Category DateTime N
1: A 2022-08-29 00:00:00 2
2: B 2022-08-29 00:00:00 3
3: A 2022-08-29 02:00:00 1
4: B 2022-08-29 02:00:00 3
Data
dt <- structure(list(Category = c("A", "A", "A 1", "A 1", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1", "B 1", "B 1", "A", "A 1", "B",
"B", "B", "B 1", "B 1", "B 1"), DateTime = c("2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 00:00:00", "2022-08-29 00:00:00", "2022-08-29 00:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00", "2022-08-29 02:00:00",
"2022-08-29 02:00:00", "2022-08-29 02:00:00")), class = "data.frame", row.names = c(NA,
-21L))
CodePudding user response:
We may conveniently use the subset
argument of aggregate
.
aggregate(cbind(CatCount=rep(1, length(Category))) ~ Category DateTime, df1, length,
subset=!grepl('1', Category))
# Category DateTime CatCount
# 1 A 2022-08-29 00:00:00 2
# 2 B 2022-08-29 00:00:00 3
# 3 A 2022-08-29 02:00:00 1
# 4 B 2022-08-29 02:00:00 3
Data borrowed from @akrun.