I have 2 dataframes of different lengths -
len(df1) = 2400
len(df2) = 100
df1 =>
colA colB colC
0 1 2
3 4 5
6 7 8
.
.
.
2400 rows.
df2 (number of rows is a factor (1/24) of num_rows in df1) =>
colD colE colF
10 11 12
13 14 15
.
.
.
100 rows
Currently I get following expected error since the lengths are different , All good here. ->
comparison -
df1['colB'] > df2['colD']
Error -
ValueError: ('Lengths must match to compare', (2400,), (100,))
Requirement ->
I want to perform this comparison in a way that consecutive 24rows in df1 get compared to 1 row in df2 to get rid of this error
(row1...row24 in df1 compared with row1 in df2)
(row25..row48 in df1 compared with row2 in df2)
and so on... Is there a way to do that ?
PS - Comparison is to be done between 2 specific columns of these dfs as shown above -> colB and colD
One way I could think of is copying the same rows 24 times in df2 and populating till 2400 rows. But I'm not sure how to do that as well since new to dataframes and numpy.
CodePudding user response:
You can repeat your df2 24 times like this & do comparison;
df2_repeated = df2.loc[df2.index.repeat(24)]
df2_repeated.index = range(0,df2_repeated.shape[0])