I have a dataframe as follows.
> df
condition duration start end
1 A 2 3 4
2 B 3 8 10
3 A 2 7 8
I want to transform the dataframe into a table that gives me a 0 or 1 for each position for each condition. The conditions can overlap at the start and end and occur multiple times; however, the multiple occurrences of a type of condition don't.
It should look like something as follows
> df2
count A B
1 1 0 0
2 2 0 0
3 3 1 0
4 4 1 0
5 5 0 0
6 6 0 0
7 7 1 0
8 8 1 1
9 9 0 1
10 10 0 1
CodePudding user response:
library(dplyr)
library(tidyr)
df %>%
mutate(count = mapply(`:`, start, end), .keep = "unused") %>%
unnest(count) %>%
pivot_wider(-duration, names_from = condition, values_from = condition, values_fn = length, values_fill = 0) %>%
complete(count = seq(count), fill = list(A = 0, B = 0)) %>%
arrange(count)
output
# A tibble: 10 × 3
count A B
<int> <int> <int>
1 1 0 0
2 2 0 0
3 3 1 0
4 4 1 0
5 5 0 0
6 6 0 0
7 7 1 0
8 8 1 1
9 9 0 1
10 10 0 1
CodePudding user response:
library(dplyr); library(tidyr)
df |>
uncount(duration, .id = "copy") |>
mutate(row = start copy - 1) |>
count(condition, row) |>
complete(condition, row = 1:max(row), fill = list(n = 0)) |>
pivot_wider(names_from = condition, values_from = n)
Result
# A tibble: 10 × 3
row A B
<dbl> <int> <int>
1 1 0 0
2 2 0 0
3 3 1 0
4 4 1 0
5 5 0 0
6 6 0 0
7 7 1 0
8 8 1 1
9 9 0 1
10 10 0 1