Any ideas on how to create a new column B using the values of column A, while using the value of the row above of the new created colum B?
The value of B should be corresponding to:
A0 = value of the row above.
A1 = 1.
A2 = value of the row above 1.
Current dataframe desired outcome
Dataframe Desired outcome
A A B
1 1 1
0 0 1
2 2 2
0 0 2
2 2 3
0 0 3
2 2 4
0 0 4
2 2 5
0 0 5
2 2 6
0 0 6
1 1 1
0 0 1
1 1 1
0 0 1
2 2 2
0 0 2
2 2 3
0 0 3
1 1 1
0 0 1
2 2 2
0 0 2
Data Frame
A <- c(1,0,2,0,2,0,2,0,2,0,2,0,1,0,1,0,2,0,2,0,1,0,2,0)
Bdesiredoutcome <- c(1,1,2,2,3,3,4,4,5,5,6,6,1,1,1,1,2,2,3,3,1,1,2,2)
df = data.frame(A,Bdesiredoutcome)
I tried using dpylr, mutate(), case_when()
and lag()
but keep running into errors. Due to using the lag()
function. When using lag(A)
the desired outcome cannot be generated.
Any idea's on how to solve this problem?
df <- df %>%
mutate(B = case_when((A == 0) ~ lag(B),
(A == 1) ~ 1,
(A == 2) ~ (lag(B) 1)
))
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
In addition: Warning message:
CodePudding user response:
We can create a grouping column with cumsum
and then create the 'B' column
library(dplyr)
df %>%
group_by(grp = cumsum(A == 1)) %>%
mutate(B = cumsum(A != 0)) %>%
ungroup %>%
select(-grp) %>%
as.data.frame
-output
A Bdesired B
1 1 1 1
2 0 1 1
3 2 2 2
4 0 2 2
5 2 3 3
6 0 3 3
7 2 4 4
8 0 4 4
9 2 5 5
10 0 5 5
11 2 6 6
12 0 6 6
13 1 1 1
14 0 1 1
15 1 1 1
16 0 1 1
17 2 2 2
18 0 2 2
19 2 3 3
20 0 3 3
21 1 1 1
22 0 1 1
23 2 2 2
24 0 2 2
CodePudding user response:
On your original question I got the following:
library(tidyverse)
library(lubridate)
df$date <-dmy(df$date)
df <- df %>%
arrange(id, date) %>%
group_by(id) %>%
mutate(daysbetween = replace_na(date - lag(date),0),
ind = 1,
NewA= case_when (daysbetween < 7 ~ 0, daysbetween > 7 ~ 1),
NewB= case_when (daysbetween < 85 ~ 0, daysbetween > 85 ~ 1),
A = case_when (1 cumsum(ind*NewA) <= 6 ~ 1 cumsum(ind*NewA),
1 cumsum(ind*NewA) > 6 ~ 1 cumsum(ind*NewA) - 6),
B = 1 cumsum(ind*NewB))%>%
select(id, date, A, B)
It only works if the reset for A is at 6. I used cumsum()
as suggested above.