I am working on a method to determine the minimum and maximum frequencies for a dataset. The method value_counts()
returns the distinct values and frequencies for the data. I tried reviewing the documentation listed here, but it does not solve my problem. My goal is to
- Determine the maximum value in the set of distinct values.
- Determine the frequency associated with the maximum value from the dataset.
- Determine the minimum value in set of of distinct values.
- Determine the frequency associated with the minimum value from the dataset.
For example,
Sample input data
A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2
8,0.070034818,2.9,1
2,0.668631847,1.5,2
2 42
8 24
5 20
6 10
7 2
4 1
3 1
maxValue = 8, maxF = 24 minValue = 2, minF = 42
Expected: maxf returns the maxf frequency for the dataset, minf returns the minimum frequency for the dataset
Actual: I'm hung up on processing the frequency from value counts.
I've written a program to process the dataset
def main():
s = pd.read_csv('A1-dm.csv')
print("******************************************************")
print("Entropy Discretization STARTED")
s = entropy_discretization(s)
print("Entropy Discretization COMPLETED")
def entropy_discretization(s):
I = {}
i = 0
n = s.nunique()['A1']
print("******************")
print("calculating maxf")
maxf(s['A1'])
print("******************")
def maxf(s):
print(s.value_counts())
def minf(s):
print(s.value_counts())
Any help with this would be greatly appreciated. I
CodePudding user response:
Us Series.idxmax
and
Series.idxmin
, if necessary output Series
use Series.agg
:
s = df['Class'].value_counts()
print (s)
3 5
2 4
1 2
Name: Class, dtype: int64
print (s.agg(['max','idxmax','min','idxmin']))
max 5
idxmax 3
min 2
idxmin 1
Name: Class, dtype: int64
Separately:
print (s.max(), s.idxmax(), s.min(), s.idxmin())
5 3 2 1