Home > Mobile >  Dropping rows that fall below a certain percentage threshold of the total rows/sum [Python]
Dropping rows that fall below a certain percentage threshold of the total rows/sum [Python]

Time:01-30

I am having an issue with filtering out the crimes - "OffenseDescription" - that fall below 5% (the specific or general solution would help so I can reproduce/adjust requirements as needed) of the total rows in the dataframe.

This is what I've tried so far, but it is crashing the kernel and is essentially running an infinite loop/execution.

I'm also doing this in VS Code, via a Jupyter Notebook.

This is the code I've attempted so far:

  tot=crime.OffenseDescription.sum()  #Find sum of column 
  
  crime[crime.groupby(['OffenseDescriptiom']).transform(lambda x:
  (x.div(tot)*100)<0.05)]   #calculate percentage filter as per
  condition

Link to a screenshot of .head() of the dataframe I am using:

image

TIA

CodePudding user response:

Use Series.value_counts with normalize for percentages and for remove groups bellow 0.05 filter mapped column greater or equal 0.05 in boolean indexing:

percentage = crime.OffenseDescription.value_counts(normalize=True) 

crime[crime['OffenseDescriptiom'].map(percentage) >= 0.05)] 
  • Related