Sorry for novice question. I just have no idea how to approach it.
I have 2 dataframes that I would like to join and create another dataframe based on the condition where df1['D'] - df2['D']
results in 1.
df1:
df1 = pd.DataFrame(
{
"A": 1.0,
"B": pd.date_range('2015-02-24', periods=4, freq='D'),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([5] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
and df 2
df2 = pd.DataFrame(
{
"B": pd.date_range('2015-02-24', periods=4, freq='D'),
"Y": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([1,2,3,4]),
}
)
that look like this
df1:
A B C D E F
0 1.00 2015-02-24 1.00 5 test foo
1 1.00 2015-02-25 1.00 5 train foo
2 1.00 2015-02-26 1.00 5 test foo
3 1.00 2015-02-27 1.00 5 train foo
df2:
B Y D
0 2015-02-24 1.00 1
1 2015-02-25 1.00 2
2 2015-02-26 1.00 3
3 2015-02-27 1.00 4
desired result:
df3
A B C D E F Y D
1.00 2015-02-27 1.00 5 train foo 1.00 4
I have tried solutions from these questions:
Pandas: Join dataframe with condition
How to do/workaround a conditional join in python Pandas?
pandas join dataframes based on conditions
and while they do resolve the issue for this particular question it is not exactly what I want to do.
What i want to do is to create condition df1['D'] - df2['D']
and then make python check each row of both dataframes and only then create a left join on date.
SQL version of this is:
SELECT *
FROM df1
LEFT JOIN df2
ON df1.b = df2.b
WHERE df1.d - df2.d = 1;
P.S. I apologise in advance if I cant upvote the answers. still dont have enough rep =(
CodePudding user response:
one liner:
df3 = pd.merge(df1,df2, on="B")\
.where((df1.D-df2.D)==1)\
.dropna()\
.reset_index(drop=True)
out:
A B C D_x E F Y D_y
0 1.0 2015-02-27 1.0 5.0 train foo 1.0 4.0