This question is basically extension of my previous question posted here.
How to handle null values/NAs in these types of situations.
Example scenario and data
df1 <- data.frame(
stringsAsFactors = FALSE,
id_1 = c("ABC","ABC","BCD",
"CDE","DEF","EFG","GHI","HIJ","IJK","JKL",
"GHI","KLM","LMN","MNO","NOP"),
id_2 = c("1A","2A","3A",
"1A","4A","5A","6A",NA,"9A","10A","7A",
"12A","13A",NA,"15A"),
id_3 = c("Z3","Z2","Z1",
"Z4","Z1","Z5","Z5","Z6","Z7","Z8","Z6","Z8",
"Z9","Z9","Z1"),
Name = c("StackOverflow1",
"StackOverflow2","StackOverflow3","StackOverflow4",
"StackOverflow5","StackOverflow6",
"StackOverflow7","StackOverflow8","StackOverflow9",
"StackOverflow10","StackOverflow11","StackOverflow12",
"StackOverflow13","StackOverflow14","StackOverflow15"),
desired_output = c(1L,1L,2L,1L,2L,
3L,3L,3L,4L,5L,3L,5L,6L,6L,2L)
)
df1
id_1 id_2 id_3 Name desired_output
1 ABC 1A Z3 StackOverflow1 1
2 ABC 2A Z2 StackOverflow2 1
3 BCD 3A Z1 StackOverflow3 2
4 CDE 1A Z4 StackOverflow4 1
5 DEF 4A Z1 StackOverflow5 2
6 EFG 5A Z5 StackOverflow6 3
7 GHI 6A Z5 StackOverflow7 3
8 HIJ <NA> Z6 StackOverflow8 3
9 IJK 9A Z7 StackOverflow9 4
10 JKL 10A Z8 StackOverflow10 5
11 GHI 7A Z6 StackOverflow11 3
12 KLM 12A Z8 StackOverflow12 5
13 LMN 13A Z9 StackOverflow13 6
14 MNO <NA> Z9 StackOverflow14 6
15 NOP 15A Z1 StackOverflow15 2
But the three approaches suggested in the linked post are not working and giving me errors.
Please suggest.
CodePudding user response:
Maybe you can replace NA
in column id_2
by the values in id_1
, and the follow the answers in previous questions.
You can try this
transform(
df,
GRP = membership(
components(
graph_from_data_frame(
reshape(
transform(
df,
id_2 = ifelse(is.na(id_2), id_1, id_2)
),
direction = "long",
idvar = c("id_1", "Name"),
varying = 2:3,
v.names = "to"
)[c("id_1", "to")]
)
)
)[id_1]
)
which gives
id_1 id_2 id_3 Name GRP
1 ABC 1A Z3 StackOverflow1 1
2 ABC 2A Z2 StackOverflow2 1
3 BCD 3A Z1 StackOverflow3 2
4 CDE 1A Z4 StackOverflow4 1
5 DEF 4A Z1 StackOverflow5 2
6 EFG 5A Z5 StackOverflow6 3
7 GHI 6A Z5 StackOverflow7 3
8 HIJ <NA> Z6 StackOverflow8 3
9 IJK 9A Z7 StackOverflow9 4
10 JKL 10A Z8 StackOverflow10 5
11 GHI 7A Z6 StackOverflow11 3
12 KLM 12A Z8 StackOverflow12 5
13 LMN 13A Z9 StackOverflow13 6
14 MNO <NA> Z9 StackOverflow14 6
15 NOP 15A Z1 StackOverflow15 2
CodePudding user response:
Just remove NA
:
df$desired_output <- df %>%
select(matches("^id_[0-9] $")) %>%
mutate(row = row_number()) %>%
pmap(~c(...) %>% .[!is.na(.)]) %>%
map(f) %>%
flatten() %>%
reduce(rbind) %>%
igraph::graph_from_edgelist() %>%
components() %>%
membership() %>%
.[as.character(seq_len(nrow(df)))]