Home > Blockchain >  issue in dplyr: unwanted NA's in new column
issue in dplyr: unwanted NA's in new column

Time:01-05

I have the following data frame:

Xnumber   Number
X17339    EWY
X17339    LW2Y
X17401    EWC
X17401    LWY
X17466    EWC
X17466    LWY 
X17466    EWY
X17466    LWC

I want to create a new column, Number2, using the following code:

library(dplyr 
df3<-df3 %>% group_by(Xnumber) %>% mutate(Number2=if_else(lead(Number)=="LWC","Unknown",Number))

This is what I the resulting data frame should look like:

    Xnumber   Number   Number2
    X17339    EWY      EWY
    X17339    LW2Y     LW2Y
    X17401    EWC      EWC
    X17401    LWY      LWY
    X17466    EWC      EWC
    X17466    LWY      LWY
    X17466    EWY      Unknown
    X17466    LWC      LWC

But instead, I also get NA's in my new column, like this.

    Xnumber   Number   Number2
    X17339    EWY      EWY
    X17339    LW2Y     NA
    X17401    EWC      EWC
    X17401    LWY      NA
    X17466    EWC      EWC
    X17466    LWY      LWY
    X17466    EWY      Unknown
    X17466    LWC      NA

I'm not sure why this is happening. Any thoughts?

CodePudding user response:

Use default:

library(dplyr)
 
df3<-df3 %>% 
  group_by(Xnumber) %>% 
  mutate(Number2=if_else(lead(Number, default = "") == "LWC","Unknown",Number))

CodePudding user response:

Since you grouped your data, lead will return an NA at each group's end (no further in-group value ahead). If you want to replace these with, say, the most recent non-NA, {tidyr}'s fill comes in handy. Example:

data.frame(x = c(1:3, NA, 5)) |>
   tidyr::fill(x, .direction = 'down')
  • Related