Change column value based on final condition- but groups by previous week's IDs-CodePudding

Trying to figure out how to code something simple.

I have a dataset that has observations for individuals (small invertebrates) in my experiment over time, including the week, individual's id #, and the observation data of interest (parasite counts). I also have a cumulative total over time for the parasite counts, grouped by the individual's ID, which is what I will actually want per week.

I would like to drop individuals that, by the end of the experiment, never had an observed sample that was positive for parasites, because they were not successfully infected. My plan was to have a binary indicator column that told me if an individual didn't have a positive sample by the end of the experiment, based on the final cumulative total per individual id (it's possible that an individual could give a positive sample one week but not the next, so a 0 cumulative total is more safe). Then I would simply subset the data by the positive binary column, removing individuals who were never positive.

A very simplified version of my dataframe would look something like:

time = c(rep(1,4),rep(2,4),rep(3,4),rep(4,4))
ids = rep(c(101:104),4)
observations = c(rep(c(25,25,0,0),4))
df = data.frame(cbind(time,ids,observations))

df2 = df %>%
  group_by(ids) %>%
  mutate(cumtot = cumsum(observations))
df2

    time   ids observations cumtot
   <dbl> <dbl>        <dbl>  <dbl>
 1     1   101           25     25
 2     1   102           25     25
 3     1   103            0      0
 4     1   104            0      0
 5     2   101           25     50
 6     2   102           25     50
 7     2   103            0      0
 8     2   104            0      0
 9     3   101           25     75
10     3   102           25     75
11     3   103            0      0
12     3   104            0      0
13     4   101           25    100
14     4   102           25    100
15     4   103            0      0
16     4   104            0      0

(I will eventually aggregate these data into means/SEMs by week and treatment group.)

What I have tried so far creates a binary "infected" column, but identifies individuals that had a cumulative sum of 0 in week 14 only. What I want is for the code to then apply this binary outcome to all the individual ids from every week (so that I drop that individual from each week's aggregate data). Not sure how to do that...

# Make a column that indicates if a snail has not shed by experiment end
df_dropped = df2 %>%
  group_by(ids) %>%
  mutate(infected = ifelse(time==max(time)&cumtot==0, 0,1))
df_dropped

    time   ids observations cumtot infected
   <dbl> <dbl>        <dbl>  <dbl>    <dbl>
 1     1   101           25     25        1
 2     1   102           25     25        1
 3     1   103            0      0        1
 4     1   104            0      0        1
 5     2   101           25     50        1
 6     2   102           25     50        1
 7     2   103            0      0        1
 8     2   104            0      0        1
 9     3   101           25     75        1
10     3   102           25     75        1
11     3   103            0      0        1
12     3   104            0      0        1
13     4   101           25    100        1
14     4   102           25    100        1
15     4   103            0      0        0
16     4   104            0      0        0

I want the output to be:

    time   ids observations cumtot infected
   <dbl> <dbl>        <dbl>  <dbl>    <dbl>
 1     1   101           25     25        1
 2     1   102           25     25        1
 3     1   103            0      0        0
 4     1   104            0      0        0
 5     2   101           25     50        1
 6     2   102           25     50        1
 7     2   103            0      0        0
 8     2   104            0      0        0
 9     3   101           25     75        1
10     3   102           25     75        1
11     3   103            0      0        0
12     3   104            0      0        0
13     4   101           25    100        1
14     4   102           25    100        1
15     4   103            0      0        0
16     4   104            0      0        0

Thanks.

CodePudding user response：

You can just use any():

library(tidyverse)

df_dropped <- df2 %>%
  group_by(ids) %>%
  mutate(infected = as.numeric(any(observations > 0)))

df_dropped
#> # A tibble: 16 x 5
#> # Groups:   ids [4]
#>     time   ids observations cumtot infected
#>    <dbl> <dbl>        <dbl>  <dbl>    <dbl>
#>  1     1   101           25     25        1
#>  2     1   102           25     25        1
#>  3     1   103            0      0        0
#>  4     1   104            0      0        0
#>  5     2   101           25     50        1
#>  6     2   102           25     50        1
#>  7     2   103            0      0        0
#>  8     2   104            0      0        0
#>  9     3   101           25     75        1
#> 10     3   102           25     75        1
#> 11     3   103            0      0        0
#> 12     3   104            0      0        0
#> 13     4   101           25    100        1
#> 14     4   102           25    100        1
#> 15     4   103            0      0        0
#> 16     4   104            0      0        0

^{Created on 2022-02-28 by the reprex package (v2.0.1)}