I am having an issue with filtering out the crimes - "OffenseDescription" - that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.
This is what I've tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.
I'm also doing this in VS Code, via a Jupyter Notebook.
This is the code I've attempted so far:
tot=crime.OffenseDescription.sum() #Find sum of column
crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
(x.div(tot)*100)<0.05)] #calculate percentage filter as per
condition
Link to a screenshot of .head() of the dataframe I am using:
TIA
CodePudding user response:
Use Series.value_counts
with normalize for percentages and for remove groups bellow 0.05
filter mapped column greater or equal 0.05
in boolean indexing
:
percentage = crime.OffenseDescription.value_counts(normalize=True)
crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)]