My test data:
tableid id name weight weightdate
1 1 david 100 01/01/2020
2 1 david 100 01/01/2020
3 1 david NA NA
4 2 anne NA NA
5 3 peter 150 02/10/2020
6 3 peter 150 02/10/2020
I have some records of a same person (in this case david), which have both valid (100 and 01/01/2020) and NA data in weight and weightdate. Im looking to normalize the NA by replacing them with valid data from that person.
For the case of anne who doesnt have any valid data, i would leave it as is. I was thinking about using fill
.
My desired output:
tableid id name weight weightdate
1 1 david 100 01/01/2020
2 1 david 100 01/01/2020
3 1 david 100 01/01/2020
4 2 anne NA NA
5 3 peter 150 02/10/2020
6 3 peter 150 02/10/2020
CodePudding user response:
You can use fill()
function from tidyr
package:
library(tidyr)
df1 %>% group_by(name) %>% fill(weight, weightdate) %>% ungroup
# A tibble: 6 x 5
tableid id name weight weightdate
<int> <int> <chr> <int> <chr>
1 1 1 david 100 01/01/2020
2 2 1 david 100 01/01/2020
3 3 1 david 100 01/01/2020
4 4 2 anne NA NA
5 5 3 peter 150 02/10/2020
6 6 3 peter 150 02/10/2020
Data
df1 <- structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L),
name = c("david", "david", "david", "anne", "peter", "peter"
), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020",
"01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
This is an alternative. Andre Wildberg already provided my favorite:
library(dplyr)
df %>%
group_by(name) %>%
mutate(across(everything(), ~ifelse(is.na(.), lag(.),.)))
tableid id name weight weightdate
<int> <int> <chr> <int> <chr>
1 1 1 david 100 01/01/2020
2 2 1 david 100 01/01/2020
3 3 1 david 100 01/01/2020
4 4 2 anne NA NA
5 5 3 peter 150 02/10/2020
6 6 3 peter 150 02/10/2020
data:
structure(list(tableid = 1:6, id = c(1L, 1L, 1L, 2L, 3L, 3L),
name = c("david", "david", "david", "anne", "peter", "peter"
), weight = c(100L, 100L, NA, NA, 150L, 150L), weightdate = c("01/01/2020",
"01/01/2020", NA, NA, "02/10/2020", "02/10/2020")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
A data.table
option
> setDT(df)[, lapply(.SD, function(x) unique(na.omit(x))), name]
name tableid id weight weightdate
1: david 1 1 100 01/01/2020
2: david 2 1 100 01/01/2020
3: david 3 1 100 01/01/2020
4: anne 4 2 NA <NA>
5: peter 5 3 150 02/10/2020
6: peter 6 3 150 02/10/2020