Home > Mobile >  replace a given value within a column with the next different number in a row in R
replace a given value within a column with the next different number in a row in R

Time:12-23

I have a data set that will ultimately be about ~30,000 observations. I have formatted a variable in such a way that the numerical values 1:4 are of interest, while the value 5 is a place holder and was not able to be collected by our testing instrument for one reason or another (not worried about why or missingness etc).

I am looking to turn any observation of 5, or series of observations of 5, into the next number in the observations. As can be seen in the example data set below, the first four observations have the number 5 while the next four observations are the number 4. In this situation I would like the first 4 observations to be changed from 5 to 4.

Note that after the 8th observation another series of 5's occur, follow by a series of 3s. In this case the 5s should be changed to 3s.

In the code block below I have provided an example of what the current data look like, delineated by the column "Current." I have also provided a column of the desired output, delineated by the column name "Desired." The obs variable was helpful to create just to show the row number of the changes in values for the case of this post.

df <- data.frame(Current = c(5,5,5,5,4,4,4,4,5,5,3,3,3,5,3,3,5,5,2,5,5,5,1),
                 Desired = c(4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,2,2,2,1,1,1,1))

df$obs = seq(1,nrow(df), by = 1)

CodePudding user response:

You could use

library(tidyr)
library(dplyr)

df %>% 
  mutate(new_column = na_if(Current, 5)) %>% 
  fill(new_column, .direction = "up")

This returns

   Current Desired new_column
1        5       4          4
2        5       4          4
3        5       4          4
4        5       4          4
5        4       4          4
6        4       4          4
7        4       4          4
8        4       4          4
9        5       3          3
10       5       3          3
11       3       3          3
12       3       3          3
13       3       3          3
14       5       3          3
15       3       3          3
16       3       3          3
17       5       2          2
18       5       2          2
19       2       2          2
20       5       1          1
21       5       1          1
22       5       1          1
23       1       1          1
  • We use dplyr's na_if function to convert the 5 into missing values.
  • Next we use tidyr's fill function to replace the NA's by the following values.

CodePudding user response:

You can use the following solution. I made use of zoo::na.locf function which takes the most non-NA value and replace all NAs on the way down. However, to fit this to your data set I first replaced all values equal to 5 with NA and then reverse the vector and after I replaced all the values with the desired values, I again reversed it back to its original order:

library(dplyr)
library(zoo)

library(zoo)

df %>%
  mutate(Desired2 = ifelse(Current == 5, NA, Current), 
         Desired2 = rev(na.locf(rev(Desired2))))

   Current Desired Desired2
1        5       4        4
2        5       4        4
3        5       4        4
4        5       4        4
5        4       4        4
6        4       4        4
7        4       4        4
8        4       4        4
9        5       3        3
10       5       3        3
11       3       3        3
12       3       3        3
13       3       3        3
14       5       3        3
15       3       3        3
16       3       3        3
17       5       2        2
18       5       2        2
19       2       2        2
20       5       1        1
21       5       1        1
22       5       1        1
23       1       1        1
  • Related