Looking to assign subgroup values to of a group to a common value in R-CodePudding

I have a series of phone numbers that have individual names associated with them. The phone numbers are not necessarily unique, and sometimes the name associated with them is missing. I would like to fill in the names with the following rules:

If a grouped phone number has blank entries and only a single name available, I would like to replace the blank names with the one available.
If the grouped number has entries with multiple non-blank names, then I cannot determine who belongs to that phone number and so I make no changes.

Here is the original table:

library("dplyr")
phone = c("1234","1234","1234","4567","4567","4567","4567","4567");phone
sgroup = c("","Jones","","Smith","","Smith","Williams","Jones");sgroup

DATA = data.frame(phone,sgroup);DATA

And here is how I want it to look:

phone1 = c("1234","1234","1234","4567","4567","4567","4567","4567");phone1
sgroup1 = c("Jones","Jones","Jones","Smith","","Smith","Williams","Jones");sgroup1

DATA1 = data.frame(phone1,sgroup1)

Note the "1234" since it originally had only 'Jones" and Blanks(NAs) assigned, so I assigned all records in that group to 'Jones. The second number (4567) had multiple names assigned, so I was unable to determine a common name so no assignment was made.

I can determine which phones have blank or non-blank name assignments, but I have been unable to determine how to assign them back to the blank records in that group.

Countdf1 <- DATA %>% count(phone,sgroup)

CodePudding user response：

I think this does what you need.

We go through all unique phone numbers (a kind of "group by") and then check if there's more than 2 unique names associated or if there are no missing names at all. If there are only two unique names and one of them is missing, that's when we apply your rule number 1. Else, we don't do anything.

phone = c("1234","1234","1234","4567","4567","4567","4567","4567")
sgroup = c("","Jones","","Smith","","Smith","Williams","Jones")

DATA = data.frame(phone,sgroup);


for (phone in unique(DATA$phone)) {
    names <- unique(DATA$sgroup[DATA$phone == phone])
    if (length(names) > 2 || all(!names %in% c(NA, "", NULL))) {
        next
    } else {
        df <- DATA[DATA$phone == phone, ]
        df$sgroup <- names[!names %in% c(NA, "", NULL)]
        DATA[DATA$phone == phone, ] <- df
    }
}

CodePudding user response：

In case you can use a data.table solution:

library(data.table)

dt <- data.table(phone = c("1234","1234","1234","4567","4567","4567","4567","4567"),
                 sgroup = c("","Jones","","Smith","","Smith","Williams","Jones"))
dt[, ngrp := .(.(unique(sgroup[sgroup != ""]))), by = "phone"][lengths(ngrp) == 1, sgroup := unlist(ngrp)][, 1:2]
#>    phone   sgroup
#> 1:  1234    Jones
#> 2:  1234    Jones
#> 3:  1234    Jones
#> 4:  4567    Smith
#> 5:  4567         
#> 6:  4567    Smith
#> 7:  4567 Williams
#> 8:  4567    Jones