Home > Net >  Assigning new field values based on ifelse logic with lag/lead function in R
Assigning new field values based on ifelse logic with lag/lead function in R

Time:07-09

Have seen several posts on this, but can't seem to get it to work for my specific use case.

I'm trying to assign a new field value based on ifelse logic. My input dataset looks like:

Input dataset

If the value for X is missing, I am trying to replace it with the previous value of X, only when the value of unique_id is the same as the previous value of unique_id. I would like the output dataset to look like this:

Output dataset

The code I've written (I'm a total beginner) doesn't throw an error, but the data doesn't change:

within(data3, data3$Output <- ifelse(data3$unique_id == lag(data3$unique_id) & is.na(data3$Output), data3$Output == lag(data3$Output), data3$Output == data3$Output))

I do change missing data values ("-") in the input dataset to official NA missing values in a previous step...hopefully allowing me to use the is.na function.

Any help will be much appreciated!

CodePudding user response:

You could group the IDs, then use fill to copy down the values replacing NAs by group. See the reproducible example below.

(If you have NAs which could appear before or after the value, then you could add , .direction = "downup" to the fill.

library(tidyverse)

# Sample data
df <- tribble(
  ~unique_id, ~x, ~mom,
  "m", 73500, 4,
  "m", NA, 0,
  "z", 4000, 5,
  "z", NA, 0,
)

df |> 
  group_by(unique_id) |> 
  fill(x) |> 
  ungroup()

#> # A tibble: 4 × 3
#>   unique_id     x   mom
#>   <chr>     <dbl> <dbl>
#> 1 m         73500     4
#> 2 m         73500     0
#> 3 z          4000     5
#> 4 z          4000     0

Created on 2022-07-09 by the reprex package (v2.0.1)

CodePudding user response:

data.table option where you replace the NA with the non-NA value per group:

df <- data.frame(unique_id = c("m", "m"),
                 X = c(73500, NA),
                 MoM = c("4%", "0%"))

library(data.table)
setDT(df)
df[, X := X[!is.na(X)][1L], by = unique_id]
df
#>    unique_id     X MoM
#> 1:         m 73500  4%
#> 2:         m 73500  0%

Created on 2022-07-09 by the reprex package (v2.0.1)

  • Related