Home > OS >  Creating a new column with conditions in addition to the row value of the new column
Creating a new column with conditions in addition to the row value of the new column

Time:11-27

Any ideas on how to create a new column B using the values of column A, while using the value of the row above of the new created colum B?

The value of B should be corresponding to:

A0 = value of the row above.
A1 = 1.
A2 = value of the row above   1.

Current dataframe desired outcome

                
Dataframe           Desired outcome  
A                   A   B
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
2                   2   4
0                   0   4
2                   2   5
0                   0   5
2                   2   6
0                   0   6
1                   1   1
0                   0   1
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
1                   1   1
0                   0   1
2                   2   2
0                   0   2


Data Frame
A <- c(1,0,2,0,2,0,2,0,2,0,2,0,1,0,1,0,2,0,2,0,1,0,2,0)
Bdesiredoutcome <- c(1,1,2,2,3,3,4,4,5,5,6,6,1,1,1,1,2,2,3,3,1,1,2,2)
df = data.frame(A,Bdesiredoutcome)

I tried using dpylr, mutate(), case_when() and lag() but keep running into errors. Due to using the lag() function. When using lag(A) the desired outcome cannot be generated. Any idea's on how to solve this problem?

df <- df %>%
          mutate(B = case_when((A == 0) ~ lag(B), 
                               (A == 1) ~ 1,
                               (A == 2) ~ (lag(B) 1)
    ))

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"
In addition: Warning message:

CodePudding user response:

We can create a grouping column with cumsum and then create the 'B' column

library(dplyr)
df %>% 
   group_by(grp = cumsum(A == 1)) %>% 
   mutate(B = cumsum(A != 0)) %>% 
   ungroup %>%
   select(-grp) %>%
   as.data.frame

-output

   A Bdesired B
1  1        1 1
2  0        1 1
3  2        2 2
4  0        2 2
5  2        3 3
6  0        3 3
7  2        4 4
8  0        4 4
9  2        5 5
10 0        5 5
11 2        6 6
12 0        6 6
13 1        1 1
14 0        1 1
15 1        1 1
16 0        1 1
17 2        2 2
18 0        2 2
19 2        3 3
20 0        3 3
21 1        1 1
22 0        1 1
23 2        2 2
24 0        2 2

CodePudding user response:

On your original question I got the following:

library(tidyverse)
library(lubridate)

df$date <-dmy(df$date)


df <- df %>% 
  arrange(id, date) %>%
  group_by(id) %>%
  mutate(daysbetween = replace_na(date - lag(date),0),
         ind = 1,
         NewA= case_when (daysbetween < 7 ~ 0, daysbetween > 7 ~ 1),
         NewB= case_when (daysbetween < 85 ~ 0, daysbetween > 85 ~ 1),
         A =   case_when (1   cumsum(ind*NewA) <= 6 ~ 1   cumsum(ind*NewA),
                          1   cumsum(ind*NewA) > 6 ~ 1   cumsum(ind*NewA) - 6),
         B =  1   cumsum(ind*NewB))%>%
  select(id, date, A, B)

It only works if the reset for A is at 6. I used cumsum() as suggested above.

  • Related