Home > other >  How to populate values from a smaller pandas DataFrame into a larger one based on index values?
How to populate values from a smaller pandas DataFrame into a larger one based on index values?

Time:07-08

I have two data frames: df1:

index         value1 
2021-07-05    9.00
2021-07-06    88.25
2021-07-07    55
...

, and df2:

index              value2
2021-07-05 09:30:00 x
2021-07-05 10:00:00 x
2021-07-05 10:30:00 x
2021-07-05 11:00:00 x
2021-07-05 11:30:00 x
...

I want all the rows in df2 that have a date that matches the date in df1 to have value2 equal to the corresponding value1. So df2 would look like:

index               value2
2021-07-05 09:30:00 9.00
2021-07-05 10:00:00 9.00
2021-07-05 10:30:00 9.00
2021-07-05 11:00:00 9.00
2021-07-05 11:30:00 9.00
...

I have tried:

if df1.index.date == df2.index.date:
    df2.value2 == df1.value1

but I got an error saying DeprecationWarning: elementwise comparison failed; this will raise an error in the future. and then nothing happened.

I also tried:

df2.value2 = [df1.value1 if df2.index.date == df1.index.date]

but got a different error saying Length of values (1) does not match length of index.

Finally, I thought of trying to do it through iterrows() but got nowhere and figured I'd ask.

CodePudding user response:

Do Not Use List Comprehension of other Traditional Loop Methods with pandas. iterrows() is almost never the right answer. Instead use the more efficient pandas/numpy specific functions. Otherwise there really isn't reason to use pandas.

Given:

  • df1
        index  value1
0  2021-07-05    9.00
1  2021-07-06   88.25
2  2021-07-07   55.00
  • df2
                 index
0  2021-07-05 09:30:00
1  2021-07-05 10:00:00
2  2021-07-05 10:30:00
3  2021-07-05 11:00:00
4  2021-07-05 11:30:00

Make DatetimeIndices:

df1['index'] = pd.to_datetime(df1['index'])
df2['index'] = pd.to_datetime(df2['index'])

df1 = df1.set_index('index')
df2 = df2.set_index('index')

Use pd.merge_asof or, even better reindex:

df = pd.merge_asof(df2, df1, left_index=True, right_index=True)
# or
df = df1.reindex(df2.index, method='ffill')

print(df)

Output:

                     value1
index
2021-07-05 09:30:00     9.0
2021-07-05 10:00:00     9.0
2021-07-05 10:30:00     9.0
2021-07-05 11:00:00     9.0
2021-07-05 11:30:00     9.0
  • Related