Home > Enterprise >  How to count pairs in R with data in one column?
How to count pairs in R with data in one column?

Time:11-24

I have one column of names of children who have teamed up in class together over multiple projects / activities, like so:

Note: This is ONE column.

Names
Tom,Jack,Meave
Tom,Arial
Arial,Tim,Tom
Neena,Meave
Meave
Tim,Meave

I want to use R so that I can see how many times two children have been paired over the projects they have done:

So:

Pair Counts
Meave,Jack 1
Tom,Jack 1
Meave,none 1
Tom,Arial 2
.
.
.

How do I go about doing this? A tidy-friendly solution would be appreciated.

(Ultimately, I would like to use this data to make a circle-network graph, but that is for another question.)

CodePudding user response:

In Base R:

a <- tcrossprod(table(stack(setNames(strsplit(df$Names,","), rownames(df)))))
a

       values
values  Arial Jack Meave Neena Tim Tom
  Arial     2    0     0     0   1   2
  Jack      0    1     1     0   0   1
  Meave     0    1     4     1   1   1
  Neena     0    0     1     1   0   0
  Tim       1    0     1     0   2   1
  Tom       2    1     1     0   1   3 

You could make the above look like the data you want. eg:

  subset(as.data.frame.table(a), 
        as.character(values) > as.character(values.1) & Freq>0)
   values values.1 Freq
5     Tim    Arial    1
6     Tom    Arial    2
9   Meave     Jack    1
12    Tom     Jack    1
16  Neena    Meave    1
17    Tim    Meave    1
18    Tom    Meave    1
30    Tom      Tim    1

In tidyverse:

df %>%
  rownames_to_column()%>%
  separate_rows(Names)%>%
  table()%>%
  crossprod()%>%
  as.data.frame.table()%>%
  filter(Freq>0 & as.character(Names) > as.character(Names.1))

  Names Names.1 Freq
1   Tim   Arial    1
2   Tom   Arial    2
3 Meave    Jack    1
4   Tom    Jack    1
5 Neena   Meave    1
6   Tim   Meave    1
7   Tom   Meave    1
8   Tom     Tim    1

Data:

df <- structure(list(Names = c("Tom,Jack,Meave", "Tom,Arial", "Arial,Tim,Tom", 
"Neena,Meave", "Meave", "Tim,Meave")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Here is one tidyverse approach...

If df is

          Names
          <chr>
 Tom,Jack,Meave
      Tom,Arial
  Arial,Tim,Tom
    Neena,Meave
          Meave
      Tim,Meave

Then

df2 <- df %>% 
   mutate(ref = row_number(),
          Names = ifelse(str_count(Names, ",") == 0,  #add nobody if only one
                         paste0(Names, ",nobody"), 
                         Names),
          Names = str_split(Names, ",")) %>% 
   unnest(Names) %>% 
   nest(data = ref) %>%                 #creates a list of refs for each name
   mutate(Names2 = list(Names)) %>%     #add a column of second names for the pairs
   unnest(Names2) %>%                   
   filter(Names != Names2) %>%          #remove self-pairs 
   left_join({.} %>% select(Names2 = Names, data2 = data) %>% 
                     distinct()) %>%    #create data for second column of names
   mutate(paired = map2_dbl(data, data2, ~length(intersect(.x$ref, .y$ref)))) %>% 
   select(-data, -data2) %>% 
   filter(paired > 0,      #remove non-occurring combinations
          Names > Names2)  #remove duplicates

Which gives...

> df2
# A tibble: 18 × 3
  Names  Names2 paired
  <chr>  <chr>   <dbl>
1 Tom    Jack        1
2 Tom    Meave       1
3 Tom    Arial       2
4 Tom    Tim         1
5 Meave  Jack        1
6 Tim    Meave       1
7 Tim    Arial       1
8 Neena  Meave       1
9 nobody Meave       1

The code changes the dataframe from a list of names for each value of ref to a list of refs for each name. It then creates a column of other names (i.e. the second of a pair) and left-joins the refs to these other names. Note that the {.} in the left_join refers to the piped dataframe at that point, creating a left join with itself.

  •  Tags:  
  • r
  • Related