Home > Mobile >  How to list categories from variable related to another variable in R?
How to list categories from variable related to another variable in R?

Time:11-29

I have a dataset with over 600k obs in R. I created a "key" variable and I want to know how many different keys appear related to an ID variable. My dataset looks something like this:

id <- c("58497484", "58544005", "74766653", "74764718", "62824455", "58497484", "58497484")
key <- c("5718368_09/06/1981_3_2014", "2077485_02/06/1977_8_2014", "2091585_23/10/1982_1_2014", "2077388_30/01/2000_11_2017", "2082225_02/07/1998_10_2017", "2077450_04/05/2001_1_2016", "2077477_03/03/1978_8_2017")

data <- as.data.frame(cbind(id, key))

So, as you can see, the id "58497484" repeats itself 3 times, but shows 3 different keys. What I want is to list each id and it's respective key in a table, so it would look something like this:

 id                  key
58497484     5718368_09/06/1981_3_2014
             2077450_04/05/2001_1_2016
             2077477_03/03/1978_8_2017

58544005     2077485_02/06/1977_8_2014
74766653     2091585_23/10/1982_1_2014
74764718     2077388_30/01/2000_11_2017
62824455     2082225_02/07/1998_10_2017

Any ideas how can I make this work? I've already tried dplyr and group_by using the two variables, but I don't get the desired output.

CodePudding user response:

We could first arrange and then replace the duplicates by blank cells:

library(dplyr)
data %>% 
  arrange(id) %>% 
  mutate(id = ifelse(duplicated(id), "", id))
        id                        key
1 58497484  5718368_09/06/1981_3_2014
2           2077450_04/05/2001_1_2016
3           2077477_03/03/1978_8_2017
4 58544005  2077485_02/06/1977_8_2014
5 62824455 2082225_02/07/1998_10_2017
6 74764718 2077388_30/01/2000_11_2017
7 74766653  2091585_23/10/1982_1_2014

CodePudding user response:

Here is a base R solution too.

data <- data[order(id, as.integer(sub('.*(\\d{4}).*', '\\1', key))),]

data$id[duplicated(data$id) == TRUE] <- ""

# Reset row names/numbers.
rownames(data) <- NULL

Output

        id                        key
1 58497484  5718368_09/06/1981_3_2014
2           2077450_04/05/2001_1_2016
3           2077477_03/03/1978_8_2017
4 58544005  2077485_02/06/1977_8_2014
5 62824455 2082225_02/07/1998_10_2017
6 74764718 2077388_30/01/2000_11_2017
7 74766653  2091585_23/10/1982_1_2014
  • Related