Home > Blockchain >  Can I summarise variable combinations after grouping by user ID in R?
Can I summarise variable combinations after grouping by user ID in R?

Time:04-18

As the title suggests I am struggling to find a way to make a nice analysis of how many users (to be identified by user ID) just use 1 device (Device) or several, and if so in which combination.

I managed to make columns with user's ID and device info, how often in total and whether the user accessed the platform several times (if their user ID occurs multiple times) (multisession).

Every time a user logs in with their device this info is saved, so there are records of some people logging in 20 times with just a PC, but some log in with different devices over time.

What I am looking for is an output that doesn't necessarily need the individual user IDs, but just how many people just use what combination of device(s). (Just a PC, just Tablet, just a phone and how many use PC and Tablet, PC and Phone or Tablet and Phone or all three).

The goal is to make a nice graph showing how many people use several devices and which type they then choose.

The holy grail would be a second analysis to see if there is a difference in number of people using a secondary device between the group of Win and Mac users (column OS), but that's not a must.

I tried the typical

library(dplyr)
SISessions1 %>% 
  group_by(StudentID)... 

as well as packages janitor, vtree, CGPfunctions but didn't manage to get what I want.

Here is a sample of my data:

structure(list(StudentID = c("CE52E2D2CC3E7A8624F7EF6E9EF4BAC736B5D7A77131F5773F2DAD20A98C49C9C710D6988017CEC3714", 
"2388CDEB82965E403D7EAE181E9A711D58CEA6F6775F60EDDB73791D5076B87F5C477BF01256E20DE8FF0", 
"10F5DE1E0A95667B811F9A6CF2D43415AE580DC7F41289CBCEE88B2FBEC2983CA8F29A2048A446CFFF8EE", 
"968DE1D2238FEE04B54DC705059AC49791D77D21245DBEB61F0602AB052B53928AE10F763FFD3F0A73CA3", 
"FB7BEECA8C097C34D25C65A00943681432FF1ECFF1FA840320DCC6CC77CFCF119898B259FAFF2F2593A3B", 
"3C3FC512008B7D33E04B51551426738F07AD1430507CA530657EEF27650A05DCE624A10AD9451570F3020", 
"3C3FC512008B7D33E04B51551426738F07AD1430507CA530657EEF27650A05DCE624A10AD9451570F3020", 
"14626EA6256FEFBB0EA89688C87D3289A73E80D724AD760D13BED298CAEA4744BED66D37365F13FE36DC5"
), Browser = c("Chrome", "InternetExplorer", "Safari Mac", "InternetExplorer", 
"InternetExplorer", "Chrome", "Chrome", "Chrome"), Platform = c("Win7", 
"Win7", "Mac10", "Win7", "Win7", "Android", "Linux", "Tablet PC"
), OS = c("Win", "Win", "Mac", "Win", "Win", "Android", "Linux", 
"Tablet PC"), Device = c("PC", "PC", "PC", "PC", "PC", "Mobile", 
"Linux", "Tablet"), noofsessions = c(1L, 8L, 9L, 15L, 5L, 3L, 
3L, 22L), multisession = c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame")) 

CodePudding user response:

So it looks like the pattern you're looking for is a summarise() paste0() with a collapse argument, then you can count what I'm calling 'category' to get the number of users falling into each category.

your_data %>%
    group_by(StudentId)%>%
    summarise(category = paste0(Device, collapse = ", "))%>%
    count(category)

It should be somewhat straightforward to extend this analysis to OS by adding OS to the group_by() and then counting with OS and category etc.

  •  Tags:  
  • r
  • Related