Home > Back-end >  how to replace specific NA in a column with certain string character
how to replace specific NA in a column with certain string character

Time:03-08

This could be very simple to but I could not figure

df<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", NA, "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

My data is like this, I want to know how to replace the NA to a string if one higher and one lower string is the same

So I can find that there is an NA

sum(is.na(df$Friend))

If it is one higher friend and one lower is friend, I want to replace it to friend

so the output look like this

df_out<-structure(list(Besti = c("Friend", "myfriend", "yourbest", "allbest"
), Friend = c("Friend", "Friend", "Friend", "Toofriend"), Val1 = c(0L, 
0L, 0L, 0L), Val2 = c(0L, 0L, 0L, 0L), Val3 = c(0L, 1L, 0L, 0L
), Val4 = c(0L, 0L, 0L, 0L), Val5 = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

so imagine I have 100 HAs or many and there is no order, maybe one before is NA or one after is NA but the two after is Friend or whatever string

If I want to replace the NA to Friend, I can do this

df$Friend <- df$Friend %>% replace_na('Friend')

CodePudding user response:

library(dplyr)
df |>
  mutate(
    upper = lag(Friend),
    lower = lead(Friend),
    replacement = ifelse(upper == lower, upper, NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5  upper     lower replacement
#> 1   Friend    Friend    0    0    0    0    0   <NA>      <NA>        <NA>
#> 2 myfriend    Friend    0    0    1    0    0 Friend    Friend      Friend
#> 3 yourbest    Friend    0    0    0    0    0   <NA> Toofriend        <NA>
#> 4  allbest Toofriend    0    0    0    0    0 Friend      <NA>        <NA>

dplyr::lag() and dplyr::lead() shift the vector Friend down/up. We can then test if they have the same value and if they do we use this value as the replacement value. dplyr::coalesce() replaces the NAs in Friend with the replacement value in the same postion. This can be simplified to:

df |>
  mutate(
    replacement = ifelse(lag(Friend) == tail(Friend), lag(Friend), NA),
    Friend = coalesce(Friend, replacement)
  )
#>      Besti    Friend Val1 Val2 Val3 Val4 Val5 replacement
#> 1   Friend    Friend    0    0    0    0    0          NA
#> 2 myfriend      <NA>    0    0    1    0    0          NA
#> 3 yourbest    Friend    0    0    0    0    0          NA
#> 4  allbest Toofriend    0    0    0    0    0          NA

CodePudding user response:

Here's another approach. To the data frame I added the values of Friend that come before and after each observation:

library(dplyr)

df$after <- lead(df$Friend)
df$before <- lag(df$Friend)

df

Output:

     Besti    Friend Val1 Val2 Val3 Val4 Val5     after before
1   Friend    Friend    0    0    0    0    0      <NA>   <NA>
2 myfriend      <NA>    0    0    1    0    0    Friend Friend
3 yourbest    Friend    0    0    0    0    0 Toofriend   <NA>
4  allbest Toofriend    0    0    0    0    0      <NA> Friend

Now we can derive a new version of the Friend variable with ifelse():

df$Friend <- ifelse(
  is.na(df$Friend) & 
  df$after == "Friend" & 
  df$before == "Friend", "Friend", df$Friend
)

df[, -c(8,9)]

Output:

     Besti    Friend Val1 Val2 Val3 Val4 Val5
1   Friend    Friend    0    0    0    0    0
2 myfriend    Friend    0    0    1    0    0
3 yourbest    Friend    0    0    0    0    0
4  allbest Toofriend    0    0    0    0    0
  • Related