I have two DataFrame
:
df1:
A B C
0 1 [a] x
1 2 [0, 1] y
2 2 [0, 2] z
df2:
A D
0 1 None
1 2 [1, 2]
I want to merge them based on A
as below:
df = pandas.merge(df1, df2, how='left', on='A')
Such that the result is
A B C D
0 1 [a] x None
1 2 [0, 1] y [1, 2]
2 2 [0, 2] z [1, 2]
However, because the dtype of column D is list, which is not hashable, I can not do it. Could you please show me how to tackle this problem?
CodePudding user response:
Your code works with Python 1.3.4:
df1 = pd.DataFrame({'A': [1, 2, 2], 'B': [['a'], [0, 1], [0, 2]], 'C': ['x', 'y', 'z']})
df2 = pd.DataFrame({'A': [1, 2], 'D': [None, [1, 2]]})
out = pd.merge(df1, df2, how='left', on='A')
print(out)
# Output:
A B C D
0 1 [a] x None
1 2 [0, 1] y [1, 2]
2 2 [0, 2] z [1, 2]
Update your version of Pandas
CodePudding user response:
From what we can tell, you are incorrect. D
doesn't need to be hashable because it's not used to determine any part of the merging. It's just copied along after the merging is done.
>>> df1
A B C
0 1 [a] x
1 2 [0, 1] y
2 2 [0, 2] z
>>> df1.to_numpy()
array([[1, list(['a']), 'x'],
[2, list([0, 1]), 'y'],
[2, list([0, 2]), 'z']], dtype=object)
>>> df2
A D
0 1 None
1 2 [1, 2]
>>> df2.to_numpy()
array([[1, None],
[2, list([1, 2])]], dtype=object)
>>> pd.merge(df1, df2, how='left', on='A')
A B C D
0 1 [a] x None
1 2 [0, 1] y [1, 2]
2 2 [0, 2] z [1, 2]
>>> pd.merge(df1, df2, how='left', on='A').to_numpy()
array([[1, list(['a']), 'x', None],
[2, list([0, 1]), 'y', list([1, 2])],
[2, list([0, 2]), 'z', list([1, 2])]], dtype=object)