In the following data frame, I want to collect members of B1, where their value in B2 is equal to or more than the value of "b" in B2. And then after this new information, count how many times each of the B1 members occurred.
dataframe:
ID B1 B2
z1 a 2.5
z1 b 1.7
z1 c 170
z1 c 9
z1 d 3
y2 a 0
y2 b 21
y2 c 15
y2 c 101
y2 d 30
y2 d 3
y2 d 15.5
x3 a 30.8
x3 a 54
x3 a 0
x3 b 30.8
x3 c 30.8
x3 d 7
so the result would be:
ID B1 B2
z1 a 2.5
z1 c 170
z1 c 9
z1 d 3
y2 c 101
y2 d 30
x3 a 30.8
x3 a 54
x3 c 30.8
and
ID B1 count
z1 a 1
z1 c 2
z1 d 1
y2 a 0
y2 c 1
y2 d 1
x3 a 2
x3 c 1
x3 d 0
CodePudding user response:
Using tidyverse:
library(tidyverse)
df %>%
group_by(ID) %>%
filter(B2 > B2[B1 == "b"]) %>%
group_by(ID, B1) %>%
count(name = "count") %>%
as.data.frame()
#> ID B1 count
#> 1 x3 a 1
#> 2 y2 c 1
#> 3 y2 d 1
#> 4 z1 a 1
#> 5 z1 c 2
#> 6 z1 d 1
Created on 2022-04-26 by the reprex package (v2.0.1)
CodePudding user response:
Grouped by 'ID', filter
where the 'B2' is greater than or equal to 'B2' where 'B1' is 'b' as well as create another condition where 'B1' is not equal to 'b'
library(dplyr)
out1 <- df1 %>%
group_by(ID) %>%
filter(any(B1 == "b") & B2 >= min(B2[B1 == "b"]), B1 != 'b')
-output
> out1
# A tibble: 9 × 3
# Groups: ID [3]
ID B1 B2
<chr> <chr> <dbl>
1 z1 a 2.5
2 z1 c 170
3 z1 c 9
4 z1 d 3
5 y2 c 101
6 y2 d 30
7 x3 a 30.8
8 x3 a 54
9 x3 c 30.8
The second output will be do a group by with summarise
to get the number of rows, and then fill the missing combinations with complete
library(tidyr)
out1 %>%
group_by(B1, .add = TRUE) %>%
summarise(count = n(), .groups = "drop_last") %>%
complete(B1 = unique(.$B1), fill = list(count = 0)) %>%
ungroup
# A tibble: 9 × 3
ID B1 count
<chr> <chr> <int>
1 x3 a 2
2 x3 c 1
3 x3 d 0
4 y2 a 0
5 y2 c 1
6 y2 d 1
7 z1 a 1
8 z1 c 2
9 z1 d 1
data
df1 <- structure(list(ID = c("z1", "z1", "z1", "z1", "z1", "y2", "y2",
"y2", "y2", "y2", "y2", "y2", "x3", "x3", "x3", "x3", "x3", "x3"
), B1 = c("a", "b", "c", "c", "d", "a", "b", "c", "c", "d", "d",
"d", "a", "a", "a", "b", "c", "d"), B2 = c(2.5, 1.7, 170, 9,
3, 0, 21, 15, 101, 30, 3, 15.5, 30.8, 54, 0, 30.8, 30.8, 7)),
class = "data.frame", row.names = c(NA,
-18L))