I have one column of names of children who have teamed up in class together over multiple projects / activities, like so:
Note: This is ONE column.
Names
Tom,Jack,Meave
Tom,Arial
Arial,Tim,Tom
Neena,Meave
Meave
Tim,Meave
I want to use R so that I can see how many times two children have been paired over the projects they have done:
So:
Pair Counts
Meave,Jack 1
Tom,Jack 1
Meave,none 1
Tom,Arial 2
.
.
.
How do I go about doing this? A tidy
-friendly solution would be appreciated.
(Ultimately, I would like to use this data to make a circle-network graph, but that is for another question.)
CodePudding user response:
In Base R:
a <- tcrossprod(table(stack(setNames(strsplit(df$Names,","), rownames(df)))))
a
values
values Arial Jack Meave Neena Tim Tom
Arial 2 0 0 0 1 2
Jack 0 1 1 0 0 1
Meave 0 1 4 1 1 1
Neena 0 0 1 1 0 0
Tim 1 0 1 0 2 1
Tom 2 1 1 0 1 3
You could make the above look like the data you want. eg:
subset(as.data.frame.table(a),
as.character(values) > as.character(values.1) & Freq>0)
values values.1 Freq
5 Tim Arial 1
6 Tom Arial 2
9 Meave Jack 1
12 Tom Jack 1
16 Neena Meave 1
17 Tim Meave 1
18 Tom Meave 1
30 Tom Tim 1
In tidyverse:
df %>%
rownames_to_column()%>%
separate_rows(Names)%>%
table()%>%
crossprod()%>%
as.data.frame.table()%>%
filter(Freq>0 & as.character(Names) > as.character(Names.1))
Names Names.1 Freq
1 Tim Arial 1
2 Tom Arial 2
3 Meave Jack 1
4 Tom Jack 1
5 Neena Meave 1
6 Tim Meave 1
7 Tom Meave 1
8 Tom Tim 1
Data:
df <- structure(list(Names = c("Tom,Jack,Meave", "Tom,Arial", "Arial,Tim,Tom",
"Neena,Meave", "Meave", "Tim,Meave")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Here is one tidyverse approach...
If df
is
Names
<chr>
Tom,Jack,Meave
Tom,Arial
Arial,Tim,Tom
Neena,Meave
Meave
Tim,Meave
Then
df2 <- df %>%
mutate(ref = row_number(),
Names = ifelse(str_count(Names, ",") == 0, #add nobody if only one
paste0(Names, ",nobody"),
Names),
Names = str_split(Names, ",")) %>%
unnest(Names) %>%
nest(data = ref) %>% #creates a list of refs for each name
mutate(Names2 = list(Names)) %>% #add a column of second names for the pairs
unnest(Names2) %>%
filter(Names != Names2) %>% #remove self-pairs
left_join({.} %>% select(Names2 = Names, data2 = data) %>%
distinct()) %>% #create data for second column of names
mutate(paired = map2_dbl(data, data2, ~length(intersect(.x$ref, .y$ref)))) %>%
select(-data, -data2) %>%
filter(paired > 0, #remove non-occurring combinations
Names > Names2) #remove duplicates
Which gives...
> df2
# A tibble: 18 × 3
Names Names2 paired
<chr> <chr> <dbl>
1 Tom Jack 1
2 Tom Meave 1
3 Tom Arial 2
4 Tom Tim 1
5 Meave Jack 1
6 Tim Meave 1
7 Tim Arial 1
8 Neena Meave 1
9 nobody Meave 1
The code changes the dataframe from a list of names for each value of ref
to a list of refs
for each name. It then creates a column of other names (i.e. the second of a pair) and left-joins the refs
to these other names. Note that the {.}
in the left_join
refers to the piped dataframe at that point, creating a left join with itself.