This could be very simple to but I could not figure
df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
My data is like this, I want to know how to replace the NA to a string if one higher and one lower string is the same
So I can find that there is an NA
sum(is.na(df$Friend))
If it is one higher friend and one lower is friend, I want to replace it to friend
so the output look like this
df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L,
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
so imagine I have 100 HAs or many and there is no order, maybe one before is NA or one after is NA but the two after is Friend or whatever string
If I want to replace the NA to Friend, I can do this
df$Friend <- df$Friend %>% replace_na('Friend')
CodePudding user response:
library(dplyr)
df |>
mutate(
upper = lag(Friend),
lower = lead(Friend),
replacement = ifelse(upper == lower, upper, NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 upper lower replacement
#> 1 Friend Friend 0 0 0 0 0 <NA> <NA> <NA>
#> 2 myfriend Friend 0 0 1 0 0 Friend Friend Friend
#> 3 yourbest Friend 0 0 0 0 0 <NA> Toofriend <NA>
#> 4 allbest Toofriend 0 0 0 0 0 Friend <NA> <NA>
dplyr::lag()
and dplyr::lead()
shift the vector Friend
down/up.
We can then test if they have the same value and if they do we use this
value as the replacement value. dplyr::coalesce()
replaces the NAs in
Friend
with the replacement
value in the same postion.
This can be simplified to:
df |>
mutate(
replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
Friend = coalesce(Friend, replacement)
)
#> Besti Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1 Friend Friend 0 0 0 0 0 NA
#> 2 myfriend <NA> 0 0 1 0 0 NA
#> 3 yourbest Friend 0 0 0 0 0 NA
#> 4 allbest Toofriend 0 0 0 0 0 NA
CodePudding user response:
Here's another approach. To the data frame I added the values of Friend
that come before and after each observation:
library(dplyr)
df$after <- lead(df$Friend)
df$before <- lag(df$Friend)
df
Output:
Besti Friend Val1 Val2 Val3 Val4 Val5 after before
1 Friend Friend 0 0 0 0 0 <NA> <NA>
2 myfriend <NA> 0 0 1 0 0 Friend Friend
3 yourbest Friend 0 0 0 0 0 Toofriend <NA>
4 allbest Toofriend 0 0 0 0 0 <NA> Friend
Now we can derive a new version of the Friend
variable with ifelse()
:
df$Friend <- ifelse(
is.na(df$Friend) &
df$after == "Friend" &
df$before == "Friend", "Friend", df$Friend
)
df[, -c(8,9)]
Output:
Besti Friend Val1 Val2 Val3 Val4 Val5
1 Friend Friend 0 0 0 0 0
2 myfriend Friend 0 0 1 0 0
3 yourbest Friend 0 0 0 0 0
4 allbest Toofriend 0 0 0 0 0