I have an ascii file as following
1 306.0416667
2 286.1666667
3 207.5
4 226.4166667
5 304.2083333
6 336.1666667
7 255.5416667
8 224.5833333
9 190.1666667
10 163.5
11 231.125
12 167.3333333
13 193.5416667
14 165
15 166
16 172.173913
17 158.9166667
18 196.8333333
19 154.875
20 303.4166667
I want to found the most frequent group of values. The groups are 0-90, 90-180, 180-270, 270-360.
I tried to use .value_counts() but with no success (even though without grouping the values).
import pandas as pd
col_names=['id','val']
df = pd.read_csv(i,names=col_names,header=None)
df['val'].value_counts().[:1].index.tolist()
CodePudding user response:
You can use pd.cut
, groupby()
, count()
like below:
>>> df = pd.DataFrame({
'freq': [306.0416667, 286.1666667, 207.5 , 226.4166667 , 304.2083333 ,
336.1666667 , 255.5416667, 224.5833333 , 190.1666667, 163.5 ,
231.125, 167.3333333 , 193.5416667 , 165 , 154.875 , 303.4166667]})
>>> ranges = [0,90,180,270, 360]
>>> df.groupby(pd.cut(df['freq'], ranges)).count()
freq
freq
(0, 90] 0
(90, 180] 4
(180, 270] 7
(270, 360] 5
>>> df.groupby(pd.cut(df['freq'], ranges)).count().idxmax()
freq (180, 270]
dtype: interval
CodePudding user response:
Use pd.cut
value_counts
, as follows:
bins = [0, 90, 180, 270, 360]
df['group'] = pd.cut(df['val'], bins)
df['group'].value_counts()
Result:
(180, 270] 8
(90, 180] 7
(270, 360] 5
(0, 90] 0
Name: group, dtype: int64
For the max entry, you can use .head(1)
, as follows:
df['group'].value_counts().head(1)
Result:
(180, 270] 8
Name: group, dtype: int64
CodePudding user response:
Bin and calculate mode()
col1
1 306.041667
2 286.166667
3 207.500000
4 226.416667
5 304.208333
6 336.166667
7 255.541667
8 224.583333
9 190.166667
10 163.500000
11 231.125000
12 167.333333
13 193.541667
14 165.000000
15 166.000000
16 172.173913
17 158.916667
18 196.833333
19 154.875000
20 303.416667
pd.cut(df['col1'], bins=[0, 90, 180,270, 360], labels=['0-90', '90-180', '180-270', '270-360'],ordered=False).mode()
CodePudding user response:
Try the below (No external library is required in this solution)
from collections import defaultdict
data = defaultdict(int)
STEP = 90
with open('data.txt') as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
_, val = line.split()
cnt = 1
while True:
if float(val) <= STEP * cnt:
key = f'{(cnt -1) * STEP}-{cnt * STEP}'
data[key] = 1
break
cnt = 1
print(data)
max_key = max(data, key=data.get)
print(f'max: {max_key}')
output
defaultdict(<class 'int'>, {'270-360': 5, '180-270': 8, '90-180': 7})
max: 180-270