I have a data set that will ultimately be about ~30,000 observations. I have formatted a variable in such a way that the numerical values 1:4 are of interest, while the value 5 is a place holder and was not able to be collected by our testing instrument for one reason or another (not worried about why or missingness etc).
I am looking to turn any observation of 5, or series of observations of 5, into the next number in the observations. As can be seen in the example data set below, the first four observations have the number 5 while the next four observations are the number 4. In this situation I would like the first 4 observations to be changed from 5 to 4.
Note that after the 8th observation another series of 5's occur, follow by a series of 3s. In this case the 5s should be changed to 3s.
In the code block below I have provided an example of what the current data look like, delineated by the column "Current." I have also provided a column of the desired output, delineated by the column name "Desired." The obs variable was helpful to create just to show the row number of the changes in values for the case of this post.
df <- data.frame(Current = c(5,5,5,5,4,4,4,4,5,5,3,3,3,5,3,3,5,5,2,5,5,5,1),
Desired = c(4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,2,2,2,1,1,1,1))
df$obs = seq(1,nrow(df), by = 1)
CodePudding user response:
You could use
library(tidyr)
library(dplyr)
df %>%
mutate(new_column = na_if(Current, 5)) %>%
fill(new_column, .direction = "up")
This returns
Current Desired new_column
1 5 4 4
2 5 4 4
3 5 4 4
4 5 4 4
5 4 4 4
6 4 4 4
7 4 4 4
8 4 4 4
9 5 3 3
10 5 3 3
11 3 3 3
12 3 3 3
13 3 3 3
14 5 3 3
15 3 3 3
16 3 3 3
17 5 2 2
18 5 2 2
19 2 2 2
20 5 1 1
21 5 1 1
22 5 1 1
23 1 1 1
- We use
dplyr
'sna_if
function to convert the5
into missing values. - Next we use
tidyr
'sfill
function to replace theNA
's by the following values.
CodePudding user response:
You can use the following solution. I made use of zoo::na.locf
function which takes the most non-NA value and replace all NA
s on the way down. However, to fit this to your data set I first replaced all values equal to 5
with NA
and then reverse the vector and after I replaced all the values with the desired values, I again reversed it back to its original order:
library(dplyr)
library(zoo)
library(zoo)
df %>%
mutate(Desired2 = ifelse(Current == 5, NA, Current),
Desired2 = rev(na.locf(rev(Desired2))))
Current Desired Desired2
1 5 4 4
2 5 4 4
3 5 4 4
4 5 4 4
5 4 4 4
6 4 4 4
7 4 4 4
8 4 4 4
9 5 3 3
10 5 3 3
11 3 3 3
12 3 3 3
13 3 3 3
14 5 3 3
15 3 3 3
16 3 3 3
17 5 2 2
18 5 2 2
19 2 2 2
20 5 1 1
21 5 1 1
22 5 1 1
23 1 1 1