I have the following problem.
I have a large data.frame. In this data.frame there are 648 different combinations of 7 variables. The data.frame is 4 times that length giving 2592 rows. What I am trying to do is to create a vector in that data.frame, which indicates which of the combinations is in that row. So there should in the end be a vector which includes the numbers 1-648 each four times.
In the end it should something like this, here an example for two variables and 3 different combinations.
a b distinct_combinations
<dbl> <chr> <dbl>
1 1 a 1
2 2 b 2
3 3 c 3
4 1 a 1
5 2 b 2
6 3 c 3
Thank you!
CodePudding user response:
The special symbol .GRP
from package data.table is essentially what you are asking for:
.GRP is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc. data.table documentation
library(data.table)
setDT(data) # change data to a data.table
data[, distinct_combinations := .GRP, by = .(a, b)]
CodePudding user response:
v1 = rep(seq(1:3),2)
v2 = rep(c("a","b","c"),2)
df = data.frame(v1,v2)
df$id = as.factor(paste(v1,v2,sep = ""))
levels(df$id) = seq(1:length(unique(df$id)))
You can do by creating a column and changing its levels to numeric
CodePudding user response:
You can group_by
your desired columns and use group_indices
:
library(tidyverse)
data %>%
group_by(across(a:b)) %>%
mutate(distinct_combinations = group_indices())
# A tibble: 6 x 3
# Groups: a, b [3]
a b distinct_combinations
<int> <chr> <int>
1 1 a 1
2 2 b 2
3 3 c 3
4 1 a 1
5 2 b 2
6 3 c 3
You can also arrange your columns and use data.table::rleidv
:
data %>%
arrange(across(a:b)) %>%
mutate(distinct_combinations = data.table::rleidv(.))
data
structure(list(a = c(1L, 2L, 3L, 1L, 2L, 3L), b = c("a", "b",
"c", "a", "b", "c")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))