Home > Blockchain >  Append/Join column from another dataframe to dataframe under comparison condition - not exact values
Append/Join column from another dataframe to dataframe under comparison condition - not exact values

Time:11-18

My two dataframes are:

df:

    Time
0    0.0
1    0.3
2    0.6
3    0.9
4    1.2
5    1.5
6    1.8
7    2.1
8    2.4
9    2.7
10   3.0
11   3.3
12   3.6
13   3.9
14   4.2
15   4.5
16   4.8

df2:

   Time  Value
0     0      6
1     1      8
2     2      9
3     3      6

My code is:

import pandas as pd
import numpy as np

df=pd.DataFrame(np.arange(0, 5, 0.3), columns=['Time'])
df2 = pd.DataFrame({'Time': [0, 1, 2, 3], 'Value': [6, 8, 9, 6]})

print(df)
print(df2)

df["Value"] = np.nan
for t1, t2, v in zip(df2["Time"].values[:-1], df2["Time"].values[1:], df2["Value"].values[:-1]):
    df.loc[(df["Time"] >= t1) & (df["Time"] < t2), "Value"] = v


print(df)

My goal is to create new Value column in df that is filling values from df2 if Time from df2 is higher or equal to Time from df.

Expected output is:

    Time  Value
0    0.0    6.0
1    0.3    6.0
2    0.6    6.0
3    0.9    6.0
4    1.2    8.0
5    1.5    8.0
6    1.8    8.0
7    2.1    9.0
8    2.4    9.0
9    2.7    9.0
10   3.0    6.0
11   3.3    6.0
12   3.6    6.0
13   3.9    6.0
14   4.2    6.0
15   4.5    6.0
16   4.8    6.0

How should I do that? Currently the output is:

    Time  Value
0    0.0    6.0
1    0.3    6.0
2    0.6    6.0
3    0.9    6.0
4    1.2    8.0
5    1.5    8.0
6    1.8    8.0
7    2.1    9.0
8    2.4    9.0
9    2.7    9.0
10   3.0    NaN
11   3.3    NaN
12   3.6    NaN
13   3.9    NaN
14   4.2    NaN
15   4.5    NaN
16   4.8    NaN

CodePudding user response:

Looks like you can use pd.merge_asof here:

pd.merge_asof(df, df2.astype({'Time':float}))

    Time  Value
0    0.0      6
1    0.3      6
2    0.6      6
3    0.9      6
4    1.2      8
5    1.5      8
6    1.8      8
7    2.1      9
8    2.4      9
9    2.7      9
10   3.0      6
11   3.3      6
12   3.6      6
13   3.9      6
14   4.2      1
15   4.5      1
16   4.8      1

CodePudding user response:

It is hard to use broadcasting and I think using for loop is more proper way in this case.

Here is the example code.

df["Value"] = np.nan
for t1, t2, v in zip(df2["Time"].values[:-1], df2["Time"].values[1:], df2["Value"].values[:-1]):
    df.loc[(df["Time"] >= t1) & (df["Time"] < t2), "Value"] = v

The result is as follows.

display(df)

    Time    Value
0   0.0 6.0
1   0.3 6.0
2   0.6 6.0
3   0.9 6.0
4   1.2 8.0
5   1.5 8.0
6   1.8 8.0
7   2.1 9.0
8   2.4 9.0
9   2.7 9.0
10  3.0 6.0
11  3.3 6.0
12  3.6 6.0
13  3.9 6.0
14  4.2 1.0
15  4.5 1.0
16  4.8 1.0
  • Related