i have a pandas dataframe with a distinct code identifier as detailed below:
df1 = pd.DataFrame([['a', 1], ['b', 2],['c', 3],['d', 4],['e', 5],['f', 5]],
columns=['code', 'value1'])
with a second dataframe with the following
df2 = pd.DataFrame([['a', 11], ['b', 12],['c', 13],['d', 14],['e', 15],['f', 16],['g', 17], ['h', 2],['i', 3],['j', 4],['k', 5],['l', 5]],
columns=['code', 'value2'])
i would like to only see the codes identified in df1 (i.e a-f) and have a third column entitled value2.
I have tried
df1 = df1.join(df2, on = 'Code')
but i keep getting a value of NaN
I have looked at several places and seen merge, concat and join, but none of them appear to work
CodePudding user response:
To only see the codes identified in df1 (i.e a-f) and have a third column entitled value2, you should use merge
method with how='inner'
and on='code
:
>>> df1.merge(df2, how='inner', on='code')
code value1 value2
0 a 1 11
1 b 2 12
2 c 3 13
3 d 4 14
4 e 5 15
5 f 5 16
CodePudding user response:
Use:
>>> df1.merge(df2, how='inner', on='code')
code value1 value2
0 a 1 11
1 b 2 12
2 c 3 13
3 d 4 14
4 e 5 15
5 f 5 16
Or do you mean by with how='outer'
and merge
?
>>> df1.merge(df2, how='outer', on='code')
code value1 value2
0 a 1.0 11
1 b 2.0 12
2 c 3.0 13
3 d 4.0 14
4 e 5.0 15
5 f 5.0 16
6 g NaN 17
7 h NaN 2
8 i NaN 3
9 j NaN 4
10 k NaN 5
11 l NaN 5
>>>
CodePudding user response:
try this:
df1 = df1.merge(df2, on = 'code')
since you named the column 'code' not 'Code'