Home > Software engineering >  How to create another column in a data frame based on repeated observations in another column?
How to create another column in a data frame based on repeated observations in another column?

Time:10-13

So basically I have a data frame that looks like this:

BX BY
1 12
1 12
1 12
2 14
2 14
3 5

I want to create another colum ID, which will have the same number for the same values in BX and BY. So the table would look like this then:

BX BY ID
1 12 1
1 12 1
1 12 1
2 14 2
2 14 2
3 5 3

CodePudding user response:

Here is a base R way.
Subset the data.frame by the grouping columns, find the duplicated rows and use a standard cumsum trick.

df1<-'BX    BY
1   12
1   12
1   12
2   14
2   14
3   5'
df1 <- read.table(textConnection(df1), header = TRUE)

cumsum(!duplicated(df1[c("BX", "BY")]))
#> [1] 1 1 1 2 2 3

df1$ID <- cumsum(!duplicated(df1[c("BX", "BY")]))
df1
#>   BX BY ID
#> 1  1 12  1
#> 2  1 12  1
#> 3  1 12  1
#> 4  2 14  2
#> 5  2 14  2
#> 6  3  5  3

Created on 2022-10-12 with reprex v2.0.2

CodePudding user response:

You can do:

transform(dat, ID = as.numeric(interaction(dat, drop = TRUE, lex.order = TRUE)))

  BX BY ID
1  1 12  1
2  1 12  1
3  1 12  1
4  2 14  2
5  2 14  2
6  3  5  3

Or if you prefer dplyr:

library(dplyr)

dat %>% 
  group_by(across()) %>%
  mutate(ID = cur_group_id()) %>%
  ungroup()

# A tibble: 6 × 3
     BX    BY    ID
  <dbl> <dbl> <int>
1     1    12     1
2     1    12     1
3     1    12     1
4     2    14     2
5     2    14     2
6     3     5     3
  • Related