Home > OS >  Selecting rows with logic involving multiple variables across rows
Selecting rows with logic involving multiple variables across rows

Time:07-20

I have a data frame like this:

df = pd.DataFrame({"product": [1,2,3,4,5], "company": ["A", "B","B","A","B"], "state": ["CA", "NY", "CA", "CA", "NY"]})

company state   product
0   A   CA      1
1   B   NY      2
2   B   CA      3
3   A   CA      4
4   B   NY      5

I would like a bool that picks out just the rows that correspond to states which have only one company in them. In this case that would be only NY which has only company B, so the desired bool would be [False, True, False, False, True]

Alternatively, I would like to know the set of states that have only one company in them. I guess I could do that e.g. using value_counts once I have the bool.

How do I do this?

Thanks!

CodePudding user response:

You can use groupby_nunique with transform to broadcast the result over rows then just check if the result is equal to 1:

df['flag'] = df.groupby('state')['company'].transform('nunique').eq(1)
print(df)

# Output
   product company state   flag
0        1       A    CA  False
1        2       B    NY   True
2        3       B    CA  False
3        4       A    CA  False
4        5       B    NY   True
  • Related