Home > Net >  How to fill NAs based on a grouped variable?
How to fill NAs based on a grouped variable?

Time:12-30

My test data:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    NA       NA
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020

I have some records of a same person (in this case david), which have both valid (100 and 01/01/2020) and NA data in weight and weightdate. Im looking to normalize the NA by replacing them with valid data from that person. For the case of anne who doesnt have any valid data, i would leave it as is. I was thinking about using fill.

My desired output:

tableid  id    name    weight   weightdate
1        1    david    100      01/01/2020
2        1    david    100      01/01/2020
3        1    david    100      01/01/2020
4        2    anne     NA       NA
5        3    peter    150      02/10/2020
6        3    peter    150      02/10/2020

CodePudding user response:

You can use fill() function from tidyr package:

library(tidyr)

df1 %>% group_by(name) %>% fill(weight, weightdate) %>% ungroup
# A tibble: 6 x 5
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

Data

df1 <- structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
    name = c("david", "david", "david", "anne", "peter", "peter"
    ), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
    "01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

This is an alternative. Andre Wildberg already provided my favorite:

library(dplyr)
df %>% 
  group_by(name) %>% 
  mutate(across(everything(), ~ifelse(is.na(.), lag(.),.)))
  tableid    id name  weight weightdate
    <int> <int> <chr>  <int> <chr>     
1       1     1 david    100 01/01/2020
2       2     1 david    100 01/01/2020
3       3     1 david    100 01/01/2020
4       4     2 anne      NA NA        
5       5     3 peter    150 02/10/2020
6       6     3 peter    150 02/10/2020

data:

structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L), 
name = c("david", "david", "david", "anne", "peter", "peter"
), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020", 
"01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

A data.table option

> setDT(df)[, lapply(.SD, function(x) unique(na.omit(x))), name]
    name tableid id weight weightdate
1: david       1  1    100 01/01/2020
2: david       2  1    100 01/01/2020
3: david       3  1    100 01/01/2020
4:  anne       4  2     NA       <NA>
5: peter       5  3    150 02/10/2020
6: peter       6  3    150 02/10/2020
  •  Tags:  
  • r
  • Related