I have the below dataset that I am working with in R:
df <- data.frame(day=seq(1,3,1), tot.infected=c(1,2,4), tot.ind=5)
df
And I would like to transform the tot.infected column into a binomial variable with 1's and 0's, such as the following dataframe:
df2 <- data.frame(year = c(rep(1,5), rep(2,5), rep(3,5)), infected = c(rep(1,1), rep(0,4), rep(1,2), rep(0,3), rep(1,4), rep(0,1)))
Is there a more elegant way to do this in R?
Thank you for your help!
I tried hard-coding a dataframe using rep(), but this is extremely time-consuming for large datasets and I was looking for a more elegant way to achieve this.
CodePudding user response:
base R
tmp <- do.call(Map, c(list(f = function(y, inf, ind) data.frame(year = y, infected = replace(integer(ind), seq(ind) <= inf, 1L))), unname(df)))
do.call(rbind, tmp)
# year infected
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
dplyr
library(dplyr)
df %>%
rowwise() %>%
summarize(tibble(year = day, infected = replace(integer(tot.ind), seq(tot.ind) <= tot.infected, 1L)))
# # A tibble: 15 x 2
# year infected
# <dbl> <int>
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
CodePudding user response:
We can do it this way:
library(dplyr)
df %>%
group_by(day) %>%
summarise(cur_data()[seq(unique(tot.ind)),]) %>%
#mutate(x = row_number())
mutate(tot.infected = ifelse(row_number() <= first(tot.infected),
first(tot.infected)/first(tot.infected), 0), .keep="used")
day tot.infected
<dbl> <dbl>
1 1 1
2 1 0
3 1 0
4 1 0
5 1 0
6 2 1
7 2 1
8 2 0
9 2 0
10 2 0
11 3 1
12 3 1
13 3 1
14 3 1
15 3 0
CodePudding user response:
Using rep.int
and replace
, basically.
with(df, data.frame(
year=do.call(rep.int, unname(df[c(1, 3)])),
infected=unlist(Map(replace, Map(rep.int, 0, tot.ind), lapply(tot.infected, seq), 1))
))
# year infected
# 1 1 1
# 2 1 0
# 3 1 0
# 4 1 0
# 5 1 0
# 6 2 1
# 7 2 1
# 8 2 0
# 9 2 0
# 10 2 0
# 11 3 1
# 12 3 1
# 13 3 1
# 14 3 1
# 15 3 0
Data:
df <- structure(list(day = c(1, 2, 3), tot.infected = c(1, 2, 4), tot.ind = c(5,
5, 5)), class = "data.frame", row.names = c(NA, -3L))