Home > Software engineering >  Calculating the most frequent pairs in a dataset
Calculating the most frequent pairs in a dataset

Time:10-01

Is it possible to calculate most frequent pairs from a combinations of pairs in a dataset with five columns? I can do this with a macro in excel, I'd be curious to see if there's a simple solution for this in google sheets.

I have a sample data and results page here :

Data:

B1  B2  B3  B4  B5
6   22  28  32  36
7   10  17  31  35
8   33  38  40  42
10  17  36  40  41
8   10  17  36  54
9   30  32  51  55
1   4   16  26  35
12  28  30  40  43
42  45  47  49  52
10  17  30  31  47
10  17  33  51  58
4   10  17  30  32
2   35  36  37  43
6   10  17  38  55
3   10  17  25  32

Results would be like:

Value1  Value2  Frequency
10  17  8
10  31  2
17  31  2
10  36  2
17  36  2
30  32  2
10  30  2
17  30  2
10  32  2
17  32  2

etc

Each row represents a data set. The pairs don't have to be adjoining. There can be numbers between them.

CodePudding user response:

Create a combination of pairs for each row using the method mentioned here. Then REDUCE all the pairs to create a virtual 2D array. Then use QUERY to group and find the count:

=QUERY(
  REDUCE(
    {"",""},
    A2:A16,
    LAMBDA(acc,cur, 
      {
        acc;
        QUERY(
          LAMBDA(mrg,
            REDUCE(
              {"",""},
              SEQUENCE(COLUMNS(mrg)-1,1,0),
              LAMBDA(a_,c_,
                {
                  a_;
                  LAMBDA(rg,
                    REDUCE(
                      {"",""},
                      OFFSET(rg,0,1,1,COLUMNS(rg)-1),
                      LAMBDA(a,c,{a;{INDEX(rg,1),c}})
                    )
                  )(OFFSET(mrg,0,c_,1,COLUMNS(mrg)-c_))
                }
              )
            )
          )(OFFSET(cur,0,0,1,5)),
        "where Col1 is not null",0
        )
      }
    )
  ),
  "Select Col1,Col2, count(Col1) group by Col1,Col2  order by count(Col1) desc "
)

Input:

B1(A1) B2 B3 B4 B5
6 22 28 32 36
7 10 17 31 35
8 33 38 40 42
10 17 36 40 41
8 10 17 36 54
9 30 32 51 55
1 4 16 26 35
12 28 30 40 43
42 45 47 49 52
10 17 30 31 47
10 17 33 51 58
4 10 17 30 32
2 35 36 37 43
6 10 17 38 55
3 10 17 25 32

Output(partial):

count
10 17 8
10 30 2
10 31 2
10 32 2
10 36 2
17 30 2
17 31 2
17 32 2
17 36 2
30 32 2
1 4 1
1 16 1
1 26 1
1 35 1
2 35 1
2 36 1
2 37 1
2 43 1
3 10 1
3 17 1
3 25 1
  • Related