How to create another column in a data frame based on repeated observations in another column?-CodePudding

So basically I have a data frame that looks like this:

BX	BY
1	12
1	12
1	12
2	14
2	14
3	5

I want to create another colum ID, which will have the same number for the same values in BX and BY. So the table would look like this then:

BX	BY	ID
1	12	1
1	12	1
1	12	1
2	14	2
2	14	2
3	5	3

CodePudding user response：

Here is a base R way.
Subset the data.frame by the grouping columns, find the duplicated rows and use a standard cumsum trick.

df1<-'BX    BY
1   12
1   12
1   12
2   14
2   14
3   5'
df1 <- read.table(textConnection(df1), header = TRUE)

cumsum(!duplicated(df1[c("BX", "BY")]))
#> [1] 1 1 1 2 2 3

df1$ID <- cumsum(!duplicated(df1[c("BX", "BY")]))
df1
#>   BX BY ID
#> 1  1 12  1
#> 2  1 12  1
#> 3  1 12  1
#> 4  2 14  2
#> 5  2 14  2
#> 6  3  5  3

^{Created on 2022-10-12 with reprex v2.0.2}

CodePudding user response：

You can do:

transform(dat, ID = as.numeric(interaction(dat, drop = TRUE, lex.order = TRUE)))

  BX BY ID
1  1 12  1
2  1 12  1
3  1 12  1
4  2 14  2
5  2 14  2
6  3  5  3

Or if you prefer dplyr:

library(dplyr)

dat %>% 
  group_by(across()) %>%
  mutate(ID = cur_group_id()) %>%
  ungroup()

# A tibble: 6 × 3
     BX    BY    ID
  <dbl> <dbl> <int>
1     1    12     1
2     1    12     1
3     1    12     1
4     2    14     2
5     2    14     2
6     3     5     3