I have two different data frames. The first one is a table that shows the relation between the ids of the other two data frame. Here is:
df_related_id = pd.DataFrame()
df_related_id ['id_1'] = [1, 2, 3,4]
df_related_id ['id_2'] = [10, 10, 10,5]
id_1 id_2
0 1 10
1 2 10
2 3 10
3 4 5
The second one is as follow:
df_2 = pd.DataFrame()
df_2['id_2'] = [10, 10, 10, 5,5, 30]
df_2['date'] = ['2017-12-31', '2017-12-31', '2017-12-31', '2017-11-31','2017-11-31', '2017-12-31']
df_2 ['hour'] = ['6:00', '6:15', '6:00', '6:00', '6:00', '6:00']
df_2 ['val'] = [-1,12,-1,11,11,14]
df_2
id_2 date hour val
0 10 2017-12-31 6:00 -1
1 10 2017-12-31 6:15 12
2 10 2017-12-31 6:00 -1
3 5 2017-11-31 6:00 11
4 5 2017-11-31 6:00 11
5 30 2017-12-31 6:00 14
So, I want to get the data based on id_1 from the second dataframe. Here is the desired output which I want. Could you please help me with that?
id_1 date hour val
0 1 2017-12-31 6:00 -1
1 1 2017-12-31 6:15 12
2 2 2017-12-31 6:00 -1
3 2 2017-12-31 6:15 12
4 3 2017-12-31 6:00 -1
5 3 2017-12-31 6:15 12
6 4 2017-11-31 6:00 11
CodePudding user response:
One solution:
df_related_id = pd.DataFrame()
df_related_id ['id_1'] = [1, 2, 3,4]
df_related_id ['id_2'] = [10, 10, 10,5]
df_2 = pd.DataFrame()
df_2['id_2'] = [10, 10, 10, 5,5, 30]
df_2['date'] = ['2017-12-31', '2017-12-31', '2017-12-31', '2017-11-31','2017-11-31', '2017-12-31']
df_2 ['hour'] = ['6:00', '6:15', '6:00', '6:00', '6:00', '6:00']
df_2 ['val'] = [-1,12,-1,11,11,14]
df_2 = df_2.drop_duplicates()
df = pd.merge(df_related_id, df_2, how='left', on='id_2').drop('id_2', axis = 'columns')
print(df)
id_1 date hour val
0 1 2017-12-31 6:00 -1
1 1 2017-12-31 6:15 12
2 2 2017-12-31 6:00 -1
3 2 2017-12-31 6:15 12
4 3 2017-12-31 6:00 -1
5 3 2017-12-31 6:15 12
6 4 2017-11-31 6:00 11