Home > Software design >  Create a new dataframe with 1's and 0's from summarized data?
Create a new dataframe with 1's and 0's from summarized data?

Time:10-26

I have the below dataset that I am working with in R:

df <- data.frame(day=seq(1,3,1), tot.infected=c(1,2,4), tot.ind=5)
df

And I would like to transform the tot.infected column into a binomial variable with 1's and 0's, such as the following dataframe:

df2 <- data.frame(year = c(rep(1,5), rep(2,5), rep(3,5)), infected = c(rep(1,1), rep(0,4), rep(1,2), rep(0,3), rep(1,4), rep(0,1)))

Is there a more elegant way to do this in R?

Thank you for your help!

I tried hard-coding a dataframe using rep(), but this is extremely time-consuming for large datasets and I was looking for a more elegant way to achieve this.

CodePudding user response:

base R

tmp <- do.call(Map, c(list(f = function(y, inf, ind) data.frame(year = y, infected = replace(integer(ind), seq(ind) <= inf, 1L))), unname(df)))
do.call(rbind, tmp)
#    year infected
# 1     1        1
# 2     1        0
# 3     1        0
# 4     1        0
# 5     1        0
# 6     2        1
# 7     2        1
# 8     2        0
# 9     2        0
# 10    2        0
# 11    3        1
# 12    3        1
# 13    3        1
# 14    3        1
# 15    3        0

dplyr

library(dplyr)
df %>%
  rowwise() %>%
  summarize(tibble(year = day, infected = replace(integer(tot.ind), seq(tot.ind) <= tot.infected, 1L)))
# # A tibble: 15 x 2
#     year infected
#    <dbl>    <int>
#  1     1        1
#  2     1        0
#  3     1        0
#  4     1        0
#  5     1        0
#  6     2        1
#  7     2        1
#  8     2        0
#  9     2        0
# 10     2        0
# 11     3        1
# 12     3        1
# 13     3        1
# 14     3        1
# 15     3        0

CodePudding user response:

We can do it this way:

library(dplyr)
df %>% 
  group_by(day) %>% 
  summarise(cur_data()[seq(unique(tot.ind)),]) %>%
  #mutate(x = row_number())
  mutate(tot.infected = ifelse(row_number() <= first(tot.infected), 
                               first(tot.infected)/first(tot.infected), 0), .keep="used")
  

    day tot.infected
   <dbl>        <dbl>
 1     1            1
 2     1            0
 3     1            0
 4     1            0
 5     1            0
 6     2            1
 7     2            1
 8     2            0
 9     2            0
10     2            0
11     3            1
12     3            1
13     3            1
14     3            1
15     3            0

CodePudding user response:

Using rep.int and replace, basically.

with(df, data.frame(
  year=do.call(rep.int, unname(df[c(1, 3)])),
  infected=unlist(Map(replace, Map(rep.int, 0, tot.ind), lapply(tot.infected, seq), 1))
))
#    year infected
# 1     1        1
# 2     1        0
# 3     1        0
# 4     1        0
# 5     1        0
# 6     2        1
# 7     2        1
# 8     2        0
# 9     2        0
# 10    2        0
# 11    3        1
# 12    3        1
# 13    3        1
# 14    3        1
# 15    3        0

Data:

df <- structure(list(day = c(1, 2, 3), tot.infected = c(1, 2, 4), tot.ind = c(5, 
5, 5)), class = "data.frame", row.names = c(NA, -3L))
  • Related