I have two dataframes. I want to merge these dataframe based on the first values. Here is a simple example: I have this df:
df = pd.DataFrame()
df['id'] = [1, 2, 3, 4, 5]
df['x'] = [3, 4, 7, 8, 9]
id x
0 1 3
1 2 4
2 3 7
3 4 8
4 5 9
I want to get the value of 'y' from the following df1 based on the first values.
df1 = pd.DataFrame()
df1['id'] = [1, 1, 2, 3,3]
df1['y'] = [ 10, 100, 200, 500, 100]
id y
0 1 10
1 1 100
2 2 200
3 3 500
4 3 100
For id=1, we have two values in df1, but we only want the first one which is 10.
The desired output is:
id y
0 1 10
1 2 200
2 3 500
Thank you so much for your help
CodePudding user response:
here is one way to do it
df1.groupby('id').first().merge(df, on='id', how='left').drop(columns='x')
id y
0 1 10
1 2 200
2 3 500
CodePudding user response:
Although this is not an ideal solution, as I am new to programming (2 months old), I could only manage this.
merged_df = pd.merge(df1, df, on='id', how='left')
merged_df.drop(columns='x', index=[1,4], inplace= True)
merged_df.reset_index()
Found another way (the common way)
df1.groupby("id").first().reset_index()