I want to find which value occurs on a given day 50% of the time or more. For example, in the dataset below, A occurs the most frequently on 06/21, however it does not occur 50% of the time or more. On 06/22, B occurs 50% of the time or more, so I would need the output to show me "B" and the date "06/22".
EDIT: There is also a third condition. If the third row says, "Accepted", then those are the only values that should be counted. In this case, I will make them all accepted since it is a small sample dataset.
import pandas as pd
# initialise data of lists.
data = {'Name':['A', 'B', 'A', 'C', 'C', 'A', 'B', 'A', 'B','B','B', 'C', 'C'], 'Date':
['06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/21', '06/22' , '06/22', '06/22', '06/22', '06/22', '06/22'], 'Status':['Accepted','Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted', 'Accepted']}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
Name | Date | Status |
---|---|---|
A | 06/21 | Accepted |
B | 06/21 | Accepted |
A | 06/21 | Accepted |
C | 06/21 | Accepted |
C | 06/21 | Accepted |
A | 06/21 | Accepted |
B | 06/21 | Accepted |
A | 06/22 | Accepted |
B | 06/22 | Accepted |
B | 06/22 | Accepted |
B | 06/22 | Accepted |
C | 06/22 | Accepted |
C | 06/22 | Accepted |
CodePudding user response:
You can use value_counts
with normalize to count the relative values, then filter:
s = df.groupby('Date')['Name'].value_counts(normalize=True).reset_index(name='freq')
s.query('freq >= 0.5')
Output:
Date Name freq
3 06/22 B 0.5
CodePudding user response:
Lets see, Summary of the problem :- Inputs - Given a date Output The freq of which values in column Name is greater than 50 percent Tell me if I am wrong here ,
import numpy as np # If not downloaded run 'pip install numpy'
date=input('Enter your date ex:06/21')
#date='06/21'
def frequency(df,date):
dfcrop=df[df['Date']==date]#Crop the columns with given date
dfcrop=df[df['Accepted']==True]# As per condition
values=list()
for value in set(list(dfcrop['Name'])): # Take out all unique names
freq=np.sum((dfcrop['Name']==value).astype(np.int32))
freq=freq/dfcrop.shape[0]# Calculate frequency
if freq >0.5: # frequency is greater than 50 percent
values.append(value)
return values
freq=frequency(df,date) # freq is a list with the names with a freq above 50 percent on given date
Hope it works happy coding ^-^