Home > Blockchain >  Store unique combinations of variables as a vector in a data frame in R
Store unique combinations of variables as a vector in a data frame in R

Time:03-10

I have the following problem.

I have a large data.frame. In this data.frame there are 648 different combinations of 7 variables. The data.frame is 4 times that length giving 2592 rows. What I am trying to do is to create a vector in that data.frame, which indicates which of the combinations is in that row. So there should in the end be a vector which includes the numbers 1-648 each four times.

In the end it should something like this, here an example for two variables and 3 different combinations.

      a b     distinct_combinations
  <dbl> <chr>                 <dbl>
1     1 a                         1
2     2 b                         2
3     3 c                         3
4     1 a                         1
5     2 b                         2
6     3 c                         3

Thank you!

CodePudding user response:

The special symbol .GRP from package data.table is essentially what you are asking for:

.GRP is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc. data.table documentation

library(data.table)
setDT(data)  # change data to a data.table
data[, distinct_combinations := .GRP, by = .(a, b)] 

CodePudding user response:

v1 = rep(seq(1:3),2)
v2 = rep(c("a","b","c"),2)
df = data.frame(v1,v2)
df$id = as.factor(paste(v1,v2,sep = ""))
levels(df$id) = seq(1:length(unique(df$id)))

You can do by creating a column and changing its levels to numeric

CodePudding user response:

You can group_by your desired columns and use group_indices:

library(tidyverse)
data %>% 
  group_by(across(a:b)) %>% 
  mutate(distinct_combinations = group_indices())

# A tibble: 6 x 3
# Groups:   a, b [3]
      a b     distinct_combinations
  <int> <chr>                 <int>
1     1 a                         1
2     2 b                         2
3     3 c                         3
4     1 a                         1
5     2 b                         2
6     3 c                         3

You can also arrange your columns and use data.table::rleidv:

data %>% 
  arrange(across(a:b)) %>% 
  mutate(distinct_combinations = data.table::rleidv(.))

data

structure(list(a = c(1L, 2L, 3L, 1L, 2L, 3L), b = c("a", "b", 
"c", "a", "b", "c")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))
  • Related