Home > Blockchain >  Subset of dataframe where a column type value is below a threshold
Subset of dataframe where a column type value is below a threshold

Time:10-08

This is my dataframe:

df = pd.DataFrame(
    {'tid': [11, 12, 12, 14, 14],
 'price': [846.94,412.65,1295.38,741.24,695.47],
 'item': ['A', 'B', 'A', 'B', 'A']}
)
df
    tid     price   item
0   11      846.94    A
1   12      412.65    B
2   12     1295.38    A
3   14      741.24    B
4   14      695.47    A

I want all rows of df except where A > 1_000 in item column.

Expected results:

    tid     price   item
0   11      846.94    A
1   12      412.65    B
3   14      741.24    B
4   14      695.47    A

CodePudding user response:

You can use boolean indexing:

# is item A?
m1 = df['item'].eq('A')
# is the price > 1000?
m2 = df['price'].gt(1000)

# keep if both conditions are not met
out = df[~(m1&m2)]

Alternative conditions, using De Morgan's equivalence:

# is the item not A?
m1 = df['item'].ne('A')
# is the price ≤ 1000?
m2 = df['price'].le(1000)

# keep the rows if either condition is met
out = df[m1|m2]

output:

   tid   price item
0   11  846.94    A
1   12  412.65    B
3   14  741.24    B
4   14  695.47    A

CodePudding user response:

import pandas as pd
import numpy as np

F1 = df['price']>1000 
F2 = df['item']=='A'

out = df[np.logical_not(np.logical_and(F1,F2))]
# or, equivalent
out = df[~(F1&F2)]
  • Related