Lottery analysis for learning-CodePudding

I'm trying to learn how to use the pandas library.

For the data source, I use the lottery combinations draws so far.

One of many tasks I'm trying to solve is to count the frequency of pairs of numbers in combinations.

I create a data frame from the list like this:

list = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    ...,
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

df = pd.DataFrame(list)
print(df.head())

Output:

.   0   1   2   3   4   5   6
0   9  11  12  18  20  26  35
1  10  13  15  20  21  25  35
2   1   8  17  21  22  27  34
3  10  13  17  18  21  29  37
4   5   8  12  17  19  21  37

For example, as a result I want to get the sum of how much time tuples of two or three numbers appear together in combinations:

Pair  : Found n time in all combinations
9,23  : 33
11,32 : 26

Can you give me some directions or example how to solve this task, please?

CodePudding user response：

Here is a simple solution using just modules from the standard library:

from itertools import combinations
from collections import Counter

draws = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

duos = Counter()
trios = Counter()

for draw in draws:
    duos.update(combinations(draw, 2))
    trios.update(combinations(draw, 3))

print('Top 5 duos')
for x in duos.most_common(5):
    print(f'{x[0]}: {x[1]}')

print()

print('Top 5 trios')
for x in trios.most_common(5):
    print(f'{x[0]}: {x[1]}')

The code snippet above will result in the following output:

Top 5 duos
(31, 39): 2
(7, 33): 2
(13, 14): 1
(13, 28): 1
(13, 30): 1

Top 5 trios
(13, 14, 28): 1
(13, 14, 30): 1
(13, 14, 31): 1
(13, 14, 37): 1
(13, 14, 39): 1

And here is a slightly more elegant version:

from itertools import combinations
from collections import Counter

draws = [
    [13, 14, 28, 30, 31, 37, 39],
    [7, 10, 12, 16, 21, 22, 33],
    [1, 2, 7, 15, 25, 31, 33],
    [3, 6, 18, 21, 31, 34, 39]
]

counters = [Counter() for _ in range(3)]

for n, counter in enumerate(counters, 2):
    for draw in draws:
        counter.update(combinations(draw, n))

    print(f'Top 10 combos of {n} numbers')

    for combo, count in counter.most_common(10):
        print(' '.join((f'{_:2d}' for _ in combo)), count, sep=': ')

    print()

Which will give us the following output:

Top 10 combos of 2 numbers
31 39: 2
 7 33: 2
13 14: 1
13 28: 1
13 30: 1
13 31: 1
13 37: 1
13 39: 1
14 28: 1
14 30: 1

Top 10 combos of 3 numbers
13 14 28: 1
13 14 30: 1
13 14 31: 1
13 14 37: 1
13 14 39: 1
13 28 30: 1
13 28 31: 1
13 28 37: 1
13 28 39: 1
13 30 31: 1

Top 10 combos of 4 numbers
13 14 28 30: 1
13 14 28 31: 1
13 14 28 37: 1
13 14 28 39: 1
13 14 30 31: 1
13 14 30 37: 1
13 14 30 39: 1
13 14 31 37: 1
13 14 31 39: 1
13 14 37 39: 1

CodePudding user response：

IIUC, you can find all combinations (e.g. of two values) for each row and then simply count:

from itertools import combinations

(df.apply(lambda x: tuple(combinations(x, r=2)), axis=1)
   .explode()
   .value_counts()
   .sort_values(ascending=False))

Results in a pandas Series like:

(31, 39)    2
(7, 33)     2
(13, 28)    1
(37, 39)    1
(13, 30)    1
           ..

Change the r=2 parameter for combinations of 3 etc. values.