I'm trying to learn how to use the pandas
library.
For the data source, I use the lottery combinations draws so far.
One of many tasks I'm trying to solve is to count the frequency of pairs of numbers in combinations.
I create a data frame from the list like this:
list = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
...,
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
df = pd.DataFrame(list)
print(df.head())
Output:
. 0 1 2 3 4 5 6
0 9 11 12 18 20 26 35
1 10 13 15 20 21 25 35
2 1 8 17 21 22 27 34
3 10 13 17 18 21 29 37
4 5 8 12 17 19 21 37
For example, as a result I want to get the sum of how much time tuples of two or three numbers appear together in combinations:
Pair : Found n time in all combinations
9,23 : 33
11,32 : 26
Can you give me some directions or example how to solve this task, please?
CodePudding user response:
Here is a simple solution using just modules from the standard library:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
duos = Counter()
trios = Counter()
for draw in draws:
duos.update(combinations(draw, 2))
trios.update(combinations(draw, 3))
print('Top 5 duos')
for x in duos.most_common(5):
print(f'{x[0]}: {x[1]}')
print()
print('Top 5 trios')
for x in trios.most_common(5):
print(f'{x[0]}: {x[1]}')
The code snippet above will result in the following output:
Top 5 duos
(31, 39): 2
(7, 33): 2
(13, 14): 1
(13, 28): 1
(13, 30): 1
Top 5 trios
(13, 14, 28): 1
(13, 14, 30): 1
(13, 14, 31): 1
(13, 14, 37): 1
(13, 14, 39): 1
And here is a slightly more elegant version:
from itertools import combinations
from collections import Counter
draws = [
[13, 14, 28, 30, 31, 37, 39],
[7, 10, 12, 16, 21, 22, 33],
[1, 2, 7, 15, 25, 31, 33],
[3, 6, 18, 21, 31, 34, 39]
]
counters = [Counter() for _ in range(3)]
for n, counter in enumerate(counters, 2):
for draw in draws:
counter.update(combinations(draw, n))
print(f'Top 10 combos of {n} numbers')
for combo, count in counter.most_common(10):
print(' '.join((f'{_:2d}' for _ in combo)), count, sep=': ')
print()
Which will give us the following output:
Top 10 combos of 2 numbers
31 39: 2
7 33: 2
13 14: 1
13 28: 1
13 30: 1
13 31: 1
13 37: 1
13 39: 1
14 28: 1
14 30: 1
Top 10 combos of 3 numbers
13 14 28: 1
13 14 30: 1
13 14 31: 1
13 14 37: 1
13 14 39: 1
13 28 30: 1
13 28 31: 1
13 28 37: 1
13 28 39: 1
13 30 31: 1
Top 10 combos of 4 numbers
13 14 28 30: 1
13 14 28 31: 1
13 14 28 37: 1
13 14 28 39: 1
13 14 30 31: 1
13 14 30 37: 1
13 14 30 39: 1
13 14 31 37: 1
13 14 31 39: 1
13 14 37 39: 1
CodePudding user response:
IIUC, you can find all combinations (e.g. of two values) for each row and then simply count:
from itertools import combinations
(df.apply(lambda x: tuple(combinations(x, r=2)), axis=1)
.explode()
.value_counts()
.sort_values(ascending=False))
Results in a pandas Series like:
(31, 39) 2
(7, 33) 2
(13, 28) 1
(37, 39) 1
(13, 30) 1
..
Change the r=2
parameter for combinations of 3 etc. values.