I have these two dataframes that I want to merge:
df1 = pd.DataFrame({'platform': ['android', 'android','android','android','ios','ios','ios','ios'],
'day': [3, 7, 14, 30,3, 7, 14, 30],
'value_m' : [1.2, 1.3, 1.7, 1.8,1.6, 2.3, 3.7, 1.8,]})
df2 = pd.DataFrame({'platform': ['android','ios','ios','android','android','android','ios','ios'],
'day': [3, 7, 14, 30, 3, 7, 14, 30],
'value_x' : [4, 6, 8, 9,4,6,7,8]})
I use the columns platform and day to create a new dataframe that includes the column 'value_x' on my df1. I have tried it with this code:
df_pred = df1.merge(df2, left_on=["platform","day"], right_on=["platform","day"], how="left")
df_pred
This is what I get:
I don't understand why it is full of NaNs after using platform and day to pull the data to the new dataframe. Any clue of why this is happening?
Thanks!
CodePudding user response:
Problem in real data is because day
is in one DataFrame string and in another number.
Try in real data convert values to same type:
df1['day'] = df1['day'].astype(int)
df2['day'] = df2['day'].astype(int)
Your sample data working well.
df_pred = df1.merge(df2, on=["platform","day"], how="left")
print (df_pred)
platform day value_m value_x
0 android 3 1.2 4.0
1 android 3 1.2 4.0
2 android 7 1.3 6.0
3 android 14 1.7 NaN
4 android 30 1.8 9.0
5 ios 3 1.6 NaN
6 ios 7 2.3 6.0
7 ios 14 3.7 8.0
8 ios 14 3.7 7.0
9 ios 30 1.8 8.0