Home > database >  How to compare every value in a Pandas dataframe to all the next values?
How to compare every value in a Pandas dataframe to all the next values?

Time:03-14

I am learning Pandas and I am moving my python code to Pandas. I want to compare every value with the next values using a sub. So the first with the second etc.. The second with the third but not with the first because I already did that. In python I use two nested loops over a list:

sub match_values (a, b):
  #do some stuff...

l = ['a', 'b', 'c']
length = len(l)

for i in range (1, length):
  for j in range (i, length):  # starts from i, not from the start!
     if match_values(l[i], l[j]):
        #do some stuff...

How do I do a similar technique in Pandas when my list is a column in a dataframe? Do I simply reference every value like before or is there a clever "vector-style" way to do this fast and efficient?

Thanks in advance,

Jo

CodePudding user response:

Can you please check this ? It provides an output in the form of a list for each row after comparing the values.

>>> import pandas as pd
>>> import numpy as np 
>>> val = [16,19,15,19,15]
>>> df = pd.DataFrame({'val': val})
>>> df
   val
0   16
1   19
2   15
3   19
4   15
>>> 
>>> 
>>> df['match'] = df.apply(lambda x: [ (1 if (x['val'] == df.loc[idx, 'val']) else 0) for idx in range(x.name 1, len(df)) ], axis=1)  
>>> df
   val         match
0   16  [0, 0, 0, 0]
1   19     [0, 1, 0]
2   15        [0, 1]
3   19           [0]
4   15            []


CodePudding user response:

Yes, vector comparison as pandas is built on Numpy:

df['columnname'] > 5

This will result in a Boolean array. If you also want to return the actually part of the dataframe:

df[df['columnname'] > 5]
  • Related