I have a series of phone numbers that have individual names associated with them. The phone numbers are not necessarily unique, and sometimes the name associated with them is missing. I would like to fill in the names with the following rules:
- If a grouped phone number has blank entries and only a single name available, I would like to replace the blank names with the one available.
- If the grouped number has entries with multiple non-blank names, then I cannot determine who belongs to that phone number and so I make no changes.
Here is the original table:
library("dplyr")
phone = c("1234","1234","1234","4567","4567","4567","4567","4567");phone
sgroup = c("","Jones","","Smith","","Smith","Williams","Jones");sgroup
DATA = data.frame(phone,sgroup);DATA
And here is how I want it to look:
phone1 = c("1234","1234","1234","4567","4567","4567","4567","4567");phone1
sgroup1 = c("Jones","Jones","Jones","Smith","","Smith","Williams","Jones");sgroup1
DATA1 = data.frame(phone1,sgroup1)
Note the "1234" since it originally had only 'Jones" and Blanks(NAs) assigned, so I assigned all records in that group to 'Jones. The second number (4567) had multiple names assigned, so I was unable to determine a common name so no assignment was made.
I can determine which phones have blank or non-blank name assignments, but I have been unable to determine how to assign them back to the blank records in that group.
Countdf1 <- DATA %>% count(phone,sgroup)
CodePudding user response:
I think this does what you need.
We go through all unique phone numbers (a kind of "group by") and then check if there's more than 2 unique names associated or if there are no missing names at all. If there are only two unique names and one of them is missing, that's when we apply your rule number 1. Else, we don't do anything.
phone = c("1234","1234","1234","4567","4567","4567","4567","4567")
sgroup = c("","Jones","","Smith","","Smith","Williams","Jones")
DATA = data.frame(phone,sgroup);
for (phone in unique(DATA$phone)) {
names <- unique(DATA$sgroup[DATA$phone == phone])
if (length(names) > 2 || all(!names %in% c(NA, "", NULL))) {
next
} else {
df <- DATA[DATA$phone == phone, ]
df$sgroup <- names[!names %in% c(NA, "", NULL)]
DATA[DATA$phone == phone, ] <- df
}
}
CodePudding user response:
In case you can use a data.table
solution:
library(data.table)
dt <- data.table(phone = c("1234","1234","1234","4567","4567","4567","4567","4567"),
sgroup = c("","Jones","","Smith","","Smith","Williams","Jones"))
dt[, ngrp := .(.(unique(sgroup[sgroup != ""]))), by = "phone"][lengths(ngrp) == 1, sgroup := unlist(ngrp)][, 1:2]
#> phone sgroup
#> 1: 1234 Jones
#> 2: 1234 Jones
#> 3: 1234 Jones
#> 4: 4567 Smith
#> 5: 4567
#> 6: 4567 Smith
#> 7: 4567 Williams
#> 8: 4567 Jones