Home > Enterprise >  Assigning unique numbers to combinations of columns with repeating values
Assigning unique numbers to combinations of columns with repeating values

Time:01-23

I have a sample csv/excel columns that look like below.

Condition Subreplicate
C 1
Ctrl 3
C 2
T C 1
T C 1
Ctrl 2
C 2
T C 2

I want to make a new column called "Replicate" that counts the combination of these 2 columns in order. I want this "Replicate" column to assign unique number for these column repeats, because I have to analyze them individually. Each row is unique data points. The counting should be based on the combination of the first two columns: Conditions and Subreplicates.

For example, if I have 6 total "C and 1" combination, the Replicate column should be ascending from 1-6. Then find "C and 2" combination and count onward from 7-upward. Same goes with T C. I want to count "T C and 1" first, the "T C and 2" next. Even if "Ctrl and 3" shows up first in the excel, I want to count up starting from "Ctrl and 1" first.

I've created an example column below. "Ctrl and 3" starts from 21 because way down in the dataframe, there are lots of "Ctrl and 1" and "Ctrl and 2".

Condition Subreplicate Replicate
C 1 1
Ctrl 3 21
C 2 2
T C 1 1
T C 1 2
Ctrl 2 20
C 2 3
T C 2 3

I hope I'm making sense. I have been very stuck, any help would be great. Thank you for your time!

Edit: Provided data via output formula 1

If you want to count only the first column but you want consider as sorting criteria first column A, and then column B, then you can adapt the previous formula as follow (Formula 2):

=LET(A, A1:A20, B, B1:B20, seq,SEQUENCE(ROWS(A)), 
  srt, SORT(HSTACK(A,B, seq), {1,2}), sA, INDEX(srt,,1), cnts, SCAN("", seq, 
    LAMBDA(ac,s, IF(ac="", 1, IF(INDEX(sA, s) = INDEX(sA, s-1), ac 1, 1)))), 
  SORTBY(cnts, INDEX(srt,,3)))

Notice we use SORT indicating as sorting criteria {1,2}. We sort first by column A, then by column B.

Here is the output (notice the highlighted rows where we change the order to show it works) output formula 2

If you want to do the count based on the first two columns, then just use as input the concatenation of the first two columns. For example (Formula 3):

=LET(AB, A1:A20&"-"&B1:B20,seq,SEQUENCE(ROWS(AB)), srt, SORT(HSTACK(AB, seq)), 
 sAB, INDEX(srt,,1), cnts, SCAN("", seq, LAMBDA(ac,s, IF(ac="", 1, 
  IF(INDEX(sAB, s) = INDEX(sAB, s-1), ac 1, 1)))), SORTBY(cnts, INDEX(srt,,2)))

Here is the output (counting based on concatenation of the first two columns): output formula 3

  • Related