Home > Software design >  Comparing two dataframes of different lengths
Comparing two dataframes of different lengths

Time:10-15

I have 2 dataframes of different lengths -

len(df1) = 2400
len(df2) = 100

df1 =>

colA  colB  colC
0     1     2   
3     4     5 
6     7     8  
.
.
.
2400 rows.

df2 (number of rows is a factor (1/24) of num_rows in df1) =>

colD  colE  colF
10     11     12    
13     14     15  
.
.
.
100 rows

Currently I get following expected error since the lengths are different , All good here. ->

comparison -

df1['colB'] > df2['colD']

Error -

ValueError: ('Lengths must match to compare', (2400,), (100,))

Requirement ->

I want to perform this comparison in a way that consecutive 24rows in df1 get compared to 1 row in df2 to get rid of this error

(row1...row24 in df1 compared with row1 in df2)

(row25..row48 in df1 compared with row2 in df2)

and so on... Is there a way to do that ?

PS - Comparison is to be done between 2 specific columns of these dfs as shown above -> colB and colD

One way I could think of is copying the same rows 24 times in df2 and populating till 2400 rows. But I'm not sure how to do that as well since new to dataframes and numpy.

CodePudding user response:

You can repeat your df2 24 times like this & do comparison;

df2_repeated = df2.loc[df2.index.repeat(24)]
df2_repeated.index = range(0,df2_repeated.shape[0])
  • Related