Home > Blockchain >  Can I summarise variable combinations further after grouping by user ID in R?
Can I summarise variable combinations further after grouping by user ID in R?

Time:04-18

As the title suggests I am struggling to find a way to make a nice analysis of how many users (to be identified by user ID) just use 1 device (Device) or several, and if so in which combination.

I managed to make columns with user's ID and device info, how often in total and whether the user accessed the platform several times (if their user ID occurs multiple times) (multisession).

Every time a user logs in with their device this info is saved, so there are records of some people logging in 20 times with just a PC, but some log in with different devices over time.

I just want to see each used device type once in my summary (sort of like a distinct command for each device. Once a user used PC, do not include his next PC login). So "Linux" instead of Linux, Linux, Linux.... or "PC, Tablet" instead of PC, PC, Tablet, PC, PC, PC, ....

library(dplyr)
SISessions1 %>% 
  group_by(StudentID) %>% 
  summarise(category = paste0(Device, collapse = ", "))%>%
  count(category)

gives me a nice summary, but includes device type for each log in:

# A tibble: 220 × 2
   category                                                                                           n
   <chr>                                                                                          <int>
 1 Linux, Linux, Linux, Linux, Linux                                                                  1
 2 Linux, Linux, Linux, Linux, PC, Linux, PC, PC, Linux, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC,…     1
 3 Mobile                                                                                            10
 4 Mobile, Mobile                                                                                     3
 5 Mobile, Mobile, Mobile, Mobile, Mobile, Mobile, Mobile, Mobile, Linux, Mobile, Mobile, Mobile      1
 6 Mobile, Mobile, Mobile, Mobile, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, P…     1
 7 Mobile, Mobile, PC, PC                                                                             1
 8 Mobile, Mobile, PC, PC, Mobile, PC, PC                                                             1
 9 Mobile, Mobile, PC, PC, Mobile, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, PC, P…     1
10 Mobile, Mobile, PC, PC, PC                                                                      2

Here is a sample of my original data:

structure(list(StudentID = c("CE52E2D2CC3E7A8624F7EF6E9EF4BAC736B5D7A77131F5773F2DAD20A98C49C9C710D6988017CEC3714", 
"2388CDEB82965E403D7EAE181E9A711D58CEA6F6775F60EDDB73791D5076B87F5C477BF01256E20DE8FF0", 
"10F5DE1E0A95667B811F9A6CF2D43415AE580DC7F41289CBCEE88B2FBEC2983CA8F29A2048A446CFFF8EE", 
"968DE1D2238FEE04B54DC705059AC49791D77D21245DBEB61F0602AB052B53928AE10F763FFD3F0A73CA3", 
"FB7BEECA8C097C34D25C65A00943681432FF1ECFF1FA840320DCC6CC77CFCF119898B259FAFF2F2593A3B", 
"3C3FC512008B7D33E04B51551426738F07AD1430507CA530657EEF27650A05DCE624A10AD9451570F3020", 
"3C3FC512008B7D33E04B51551426738F07AD1430507CA530657EEF27650A05DCE624A10AD9451570F3020", 
"14626EA6256FEFBB0EA89688C87D3289A73E80D724AD760D13BED298CAEA4744BED66D37365F13FE36DC5"
), Browser = c("Chrome", "InternetExplorer", "Safari Mac", "InternetExplorer", 
"InternetExplorer", "Chrome", "Chrome", "Chrome"), Platform = c("Win7", 
"Win7", "Mac10", "Win7", "Win7", "Android", "Linux", "Tablet PC"
), OS = c("Win", "Win", "Mac", "Win", "Win", "Android", "Linux", 
"Tablet PC"), Device = c("PC", "PC", "PC", "PC", "PC", "Mobile", 
"Linux", "Tablet"), noofsessions = c(1L, 8L, 9L, 15L, 5L, 3L, 
3L, 22L), multisession = c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame")) 

CodePudding user response:

It needs unique before we paste

library(dplyr)
SISessions1 %>% 
  group_by(StudentID) %>% 
  summarise(category = paste0(sort(unique(Device)), collapse = ", "))

CodePudding user response:

I see the problem you're getting given your original data - the problem doesn't crop up in the sample data, so for future reference try to create a new reprex in which the problem you're facing is present!

But you can add either:

SISessions1 %>%
  distinct(StudentID, Device, .keep_all = TRUE)%>%
  arrange(Device)%>%
  group_by(StudentID)%>% 
  summarise(category = paste0(Device, collapse = ", "))%>%
  count(category)

Or see Akrun's post for a unique() route with paste0()^

I have added arrange(Device) to avoid the ordering issue, a beginner's equivalent to Akrun's nested calls :)

  •  Tags:  
  • r
  • Related