get the most frequent group of values with pandas in python-CodePudding

I have an ascii file as following

1   306.0416667
2   286.1666667
3   207.5
4   226.4166667
5   304.2083333
6   336.1666667
7   255.5416667
8   224.5833333
9   190.1666667
10  163.5
11  231.125
12  167.3333333
13  193.5416667
14  165
15  166
16  172.173913
17  158.9166667
18  196.8333333
19  154.875
20  303.4166667

I want to found the most frequent group of values. The groups are 0-90, 90-180, 180-270, 270-360.

I tried to use .value_counts() but with no success (even though without grouping the values).

import pandas as pd
col_names=['id','val']
df = pd.read_csv(i,names=col_names,header=None)
df['val'].value_counts().[:1].index.tolist()

CodePudding user response：

You can use pd.cut, groupby(), count() like below:

>>> df = pd.DataFrame({
    'freq': [306.0416667, 286.1666667, 207.5 , 226.4166667 , 304.2083333 , 
             336.1666667 , 255.5416667, 224.5833333 , 190.1666667, 163.5 , 
             231.125, 167.3333333 , 193.5416667 , 165 , 154.875 , 303.4166667]})

>>> ranges = [0,90,180,270, 360]
>>> df.groupby(pd.cut(df['freq'], ranges)).count()

           freq
freq    
(0, 90]     0
(90, 180]   4
(180, 270]  7
(270, 360]  5

>>> df.groupby(pd.cut(df['freq'], ranges)).count().idxmax()
freq    (180, 270]
dtype: interval

CodePudding user response：

Use pd.cut value_counts, as follows:

bins = [0, 90, 180, 270, 360]
df['group'] = pd.cut(df['val'], bins)

df['group'].value_counts()

Result:

(180, 270]    8
(90, 180]     7
(270, 360]    5
(0, 90]       0
Name: group, dtype: int64

For the max entry, you can use .head(1), as follows:

df['group'].value_counts().head(1)

Result:

(180, 270]    8
Name: group, dtype: int64

CodePudding user response：

Bin and calculate mode()

 col1
1   306.041667
2   286.166667
3   207.500000
4   226.416667
5   304.208333
6   336.166667
7   255.541667
8   224.583333
9   190.166667
10  163.500000
11  231.125000
12  167.333333
13  193.541667
14  165.000000
15  166.000000
16  172.173913
17  158.916667
18  196.833333
19  154.875000
20  303.416667
pd.cut(df['col1'], bins=[0, 90, 180,270, 360], labels=['0-90', '90-180', '180-270', '270-360'],ordered=False).mode()

CodePudding user response：

Try the below (No external library is required in this solution)

from collections import defaultdict

data = defaultdict(int)
STEP = 90

with open('data.txt') as f:
    lines = [l.strip() for l in f.readlines()]
    for line in lines:
        _, val = line.split()
        cnt = 1
        while True:
            if float(val) <= STEP * cnt:
                key = f'{(cnt -1) * STEP}-{cnt * STEP}'
                data[key]  = 1
                break
            cnt  = 1
print(data)

max_key = max(data, key=data.get)
print(f'max: {max_key}')

output

defaultdict(<class 'int'>, {'270-360': 5, '180-270': 8, '90-180': 7})
max: 180-270