I have the following data:-
library(data.table)
data <- data <- data.table(address = c("AA", "BB", "AA", "CC", "DD", "EE", "DD"),
revenue = c(NA, 121, 22, 33, 44, 33, NA),
castord = c(21, 22, NA, 3, NA, 223, 33),
versaze = c(NA, 22, 124, 33, NA, 44, 43))
data
# address revenue castord versaze
# 1: AA NA 21 NA
# 2: BB 121 22 22
# 3: AA 22 NA 124
# 4: CC 33 3 33
# 5: DD 44 NA NA
# 6: EE 33 223 44
# 7: DD NA 33 43
Now this data has AA
and DD
address
es duplicated. And their first occurrence has some NA
data as you can see in the row 1
and 5
. What I want to do is fill this data by using the duplicated rows of these address
es. If the duplicated row has NA values, then NA shouldn't replace the values in the row of first occurrence. This will give me the following data:-
data <- data.table(address = c("AA", "BB", "AA", "CC", "DD", "EE", "DD"),
revenue = c(22, 121, 22, 33, 44, 33, NA),
castord = c(21, 22, NA, 3, 33, 223, 33),
versaze = c(124, 22, 124, 33, 43, 44, 43))
# address revenue castord versaze
# 1: AA 22 21 124
# 2: BB 121 22 22
# 3: AA 22 NA 124
# 4: CC 33 3 33
# 5: DD 44 33 43
# 6: EE 33 223 44
# 7: DD NA 33 43
And then remove those duplicated rows:-
data <- data.table(address = c("AA", "BB", "CC", "DD", "EE"),
revenue = c(22, 121, 33, 44, 33),
castord = c(21, 22, 3, 33, 223),
versaze = c(124, 22, 33, 43, 44))
# address revenue castord versaze
# 1: AA 22 21 124
# 2: BB 121 22 22
# 3: CC 33 3 33
# 4: DD 44 33 43
# 5: EE 33 223 44
CodePudding user response:
Using data.table, group by address, loop through columns, remove NAs, get the first value:
data[, lapply(.SD, function(i) (na.omit(i)[ 1 ])), by = address]
# address revenue castord versaze
# 1: AA 22 21 124
# 2: BB 121 22 22
# 3: CC 33 3 33
# 4: DD 44 33 43
# 5: EE 33 223 44
CodePudding user response:
If you use dplyr::group_by
and summarise
where you take the first value that is not NA
using na.omit
, it fills out the empty values for you if the first row is NA
.
data <- data %>% group_by(address) %>%
summarise(
revenue = first(na.omit(revenue)),
castord = first(na.omit(castord)),
versaze = first(na.omit(versaze))
)