I have three data frames. df_k
, df_w
and df_related
. df_related
is a df which connect the other two dfs. I want to choose the value from df_w
which the ids and the date is same as the date in df_k
. Here is a simple example:
df_k = pd.DataFrame()
df_k['id_k'] = [3, 4, 4]
df_k['date'] = ['2021-05-01', '2021-05-01', '2021-05-02']
df_k
id_k date
0 3 2021-05-01
1 4 2021-05-01
2 4 2021-05-02
Here is the df_related:
df_related = pd.DataFrame()
df_related['id_w'] = [1, 2, 3]
df_related['id_k'] = [3, 4, 5]
df_related
id_w id_k
0 1 3
1 2 4
2 3 5
And here is the df_w:
df_w['id_w'] = [1, 1, 2, 2, 3]
df_w['date'] = ['2021-05-01', '2021-05-02', '2021-05-01', '2021-05-02', '2021-05-01']
df_w['val'] = [1,-1, 2,-2, 6]
id_w date val
0 1 2021-05-01 1
1 1 2021-05-02 -1
2 2 2021-05-01 2
3 2 2021-05-02 -2
4 3 2021-05-01 6
Here is the desired output:
id_w date val
0 1 2021-05-01 1
1 2 2021-05-01 2
2 2 2021-05-02 -2
Explanation: I only has the id_w in the df_k for one day.(2021-05-01). I dont any data in df_k regarding the id_w=3. I have two data in two different date for id=3 in id_k. Could you please help me with that?
CodePudding user response:
You can map
and merge
. There are several ways such as assigning a new column, or like here, using a merge key:
key = df_k['id_k'].map(df_related.set_index('id_k')['id_w']).rename('id_w')
out = (df_k.merge(df_w, left_on=[key, 'date'], right_on=['id_w', 'date'])
#.drop(columns='id_k') # uncomment to drop id_k
)
output:
id_k date id_w val
0 3 2021-05-01 1 1
1 4 2021-05-01 2 2
2 4 2021-05-02 2 -2