Home > Software engineering >  Removing rows where there is a value match
Removing rows where there is a value match

Time:12-08

def remove_low_data_states(column_name):
    items = df[column_name].value_counts().reset_index()
    items.columns = ['place', 'value']
    print(f'Items in column: [{column_name}] with low data')
    return list(items[items['value'].apply(lambda val: val < items.value.median())].place)

remove_low_data_states('col1') -- > returns ['hello', 'bye']

Orignal table

col1 col2 col3
hello 2 4
world 2 4
bye 2 4

Updated table

col1 col2 col3
world 2 4

The above method gives me a list of names within a column that do not pass the median criteria. How can I then use the list of names to go and remove the rows that are associated with the row value ??

I have tried using pd.drop but that is not to helpful, or I am making some sort of mistake.

CodePudding user response:

We can use .isin()


def remove_low_data_states(column_name):
    items = df[column_name].value_counts().reset_index()
    items.columns = ['place', 'value']
    print(f'Items in column: [{column_name}] with low data')
    return list(items[items['value'].apply(lambda val: val < items.value.median())].place)

df = df[~df['col1'].isin(remove_low_data_states('col1'))]

df.head()
  • Related