Home > Mobile >  How to find the most frequent value based on dates in another column in Python
How to find the most frequent value based on dates in another column in Python

Time:12-21

I want to find which value occurs on a given day 50% of the time or more. For example, in the dataset below, A occurs the most frequently on 06/21, however it does not occur 50% of the time or more. On 06/22, B occurs 50% of the time or more, so I would need the output to show me "B" and the date "06/22".

EDIT: There is also a third condition. If the third row says, "Accepted", then those are the only values that should be counted. In this case, I will make them all accepted since it is a small sample dataset.

import pandas as pd

# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C'], 'Date': 
['06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/22' , '06/22', '06/22', '06/22', '06/22', '06/22'], 'Status':['Accepted','Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted']}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)
Name Date Status
A 06/21 Accepted
B 06/21 Accepted
A 06/21 Accepted
C 06/21 Accepted
C 06/21 Accepted
A 06/21 Accepted
B 06/21 Accepted
A 06/22 Accepted
B 06/22 Accepted
B 06/22 Accepted
B 06/22 Accepted
C 06/22 Accepted
C 06/22 Accepted

CodePudding user response:

You can use value_counts with normalize to count the relative values, then filter:

s = df.groupby('Date')['Name'].value_counts(normalize=True).reset_index(name='freq')
s.query('freq >= 0.5')

Output:

    Date Name  freq
3  06/22    B   0.5

CodePudding user response:

Lets see, Summary of the problem :- Inputs - Given a date Output The freq of which values in column Name is greater than 50 percent Tell me if I am wrong here ,

import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
    dfcrop=df[df['Date']==date]#Crop the columns with given date
    dfcrop=df[df['Accepted']==True]# As per condition
    values=list()
    for value in set(list(dfcrop['Name'])): # Take out all unique names
        freq=np.sum((dfcrop['Name']==value).astype(np.int32))
        freq=freq/dfcrop.shape[0]# Calculate frequency
        if freq >0.5: # frequency is greater than 50 percent
           values.append(value)
    return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date

Hope it works happy coding ^-^

  • Related