For a networkanalysis with igraph i want to create an edgelist from a table with an edge attribute which is contained within the Origin variable. I imported an excel file that looks like this
After that i separated the second column to multiple columns and trimmed spaces.
test<-separate(ID_Kontakt_import_test, 'Contacts 1', paste("Contacts", 1:20, sep="_"), sep=",", extra="drop")
test<-data.frame(lapply(test,trimws),stringsAsFactors = FALSE)
Here is a part of my dataset.
structure(list(ID = c("ID_003", "ID_004", "ID_009", "ID_009"),
Contacts_1 = c("ID_001", "ID_001", "ID_001", "ID_398"), Contacts_2 = c("ID_002",
"ID_002", "ID_002", NA), Contacts_3 = c("ID_004", "ID_003",
"ID_003", NA), Contacts_4 = c("ID_005", "ID_005", "ID_004",
NA), Contacts_5 = c("ID_006", "ID_006", "ID_005", NA), Contacts_6 = c("ID_007",
"ID_007", "ID_006", NA), Contacts_7 = c("ID_008", "ID_008",
"ID_007", NA), Contacts_8 = c("ID_009", "ID_009", "ID_008",
NA), Contacts_9 = c(NA, NA, "ID_011", NA), Contacts_10 = c(NA,
NA, "ID_012", NA), Contacts_11 = c(NA, NA, "ID_013", NA),
Contacts_12 = c(NA, NA, "ID_016", NA), Contacts_13 = c(NA,
NA, "ID_017", NA), Contacts_14 = c(NA, NA, "ID_028", NA),
Contacts_15 = c(NA, NA, "ID_040", NA), Contacts_16 = c(NA_character_,
NA_character_, NA_character_, NA_character_), Contacts_17 = c(NA_character_,
NA_character_, NA_character_, NA_character_), Contacts_18 = c(NA_character_,
NA_character_, NA_character_, NA_character_), Contacts_19 = c(NA_character_,
NA_character_, NA_character_, NA_character_), Contacts_20 = c(NA_character_,
NA_character_, NA_character_, NA_character_), Origin = c("1",
"1", "1", "2")), class = "data.frame", row.names = c(NA,
-4L))
I already created an edgelist without the edge attribute by transforming the data frame to a matrix and creating an edgelist with cbind. But i dont know how to do it with the edge attributes to be in a third column.
m <- as.matrix(test)
el <- cbind(m[, 1], c(m[, -1])) #create edgelist
el<-na.omit(el) #drop NA
dups <- duplicated(t(apply(el, 1, sort)))
el2<-el[!dups, ] #drop duplicates
So i want my data to look basically like this with all edges
V1 | V2 | Origin |
---|---|---|
ID_003 | ID_001 | 1 |
ID_003 | ID_009 | 1 |
ID_009 | ID_040 | 1 |
ID_009 | ID_389 | 2 |
CodePudding user response:
Using tidyr/dplyr
library(tidyr)
library(dplyr)
df2 <- df %>%
tidyr::pivot_longer(cols = contains("Contacts"), values_to = "V2") %>%
dplyr::select(V1 = ID, V2, Origin)
df2[complete.cases(df2),]
# A tibble: 32 × 3
V1 V2 Origin
<chr> <chr> <chr>
1 ID_003 ID_001 1
2 ID_003 ID_002 1
3 ID_003 ID_004 1
4 ID_003 ID_005 1
5 ID_003 ID_006 1
6 ID_003 ID_007 1
7 ID_003 ID_008 1
8 ID_003 ID_009 1
9 ID_004 ID_001 1
10 ID_004 ID_002 1
# … with 22 more rows