Home > Blockchain >  Boolean column in pandas with a windowed function that checking a complex score condition
Boolean column in pandas with a windowed function that checking a complex score condition

Time:12-24

I have a frame that looks like this:

x        y         score    
Cat      Dog       0.1    
Dog      Monkey    0.5     
Fish     Cat       0.2     
Cat      Fish      0.2   
Monkey   Dog       0.4    
Dog      Cat       0.7    

Basically I want a new boolean column that looks at all pairs and their reverse, for example (Cat, Dog) and (Dog,Cat) and assign a True for when the score of a pair is greater than the score of the reversed pair, otherwise False. If there is equality, both pairs are assigned True. That is:

x        y         score    bool
Cat      Dog       0.1      False   # False because Dog,Cat > Cat,Dog
Dog      Monkey    0.5      True    # True because Dog,Monkey > Monkey, Dog
Fish     Cat       0.2      True    # True because both have the same score
Cat      Fish      0.2      True    # ...
Monkey   Dog       0.4      False
Dog      Cat       0.7      True

I am certain I can get a function done by applying on the rows a filter for the reverse and then return a checks on the score; however, the list can be quite long (n>100k) and time is a factor. It turns out this is trickier than I thought. I was wondering if this can be done in a more pythonic way with a magic Pandas function that haven't encoutered yet or a rolling window.

Note:

  1. Every pair (x,y) is unique.
  2. For every pair (x,y) there is exactly one reverse (y,x).
  3. There can be multiple occurences of the same category within the same column, e.g. Cat appears twice in this frame for x.
  4. Scores range from 0-1.

Frame:

df = pd.DataFrame.from_records(zip(["Cat","Dog","Fish","Cat","Monkey","Dog"], ["Dog","Monkey","Cat","Fish","Dog","Cat"], [0.1,0.5,0.2,0.2,0.4,0.7]),columns=["x","y","score"])

CodePudding user response:

You can swap columns x & y, merge on x/y pair, and compare score vs reversed score:

(df.merge(
     df.assign(x = df.y, y = df.x)
       .rename(columns={'score': 'rev_score'})
 ).assign(bool = lambda x: x.score >= x.rev_score)
 .drop('rev_score', axis=1))

        x       y  score   bool
0     Cat     Dog    0.1  False
1     Dog  Monkey    0.5   True
2    Fish     Cat    0.2   True
3     Cat    Fish    0.2   True
4  Monkey     Dog    0.4  False
5     Dog     Cat    0.7   True
  • Related