Home > Software engineering >  Pandas merge with dtype=list
Pandas merge with dtype=list

Time:12-13

I have two DataFrame:

df1:

    A        B        C
0   1       [a]       x
1   2    [0, 1]       y
2   2    [0, 2]       z

df2:

       A        D
0      1     None
1      2     [1, 2]

I want to merge them based on A as below:

df = pandas.merge(df1, df2, how='left', on='A')

Such that the result is

    A        B        C      D
0   1       [a]       x    None
1   2    [0, 1]       y    [1, 2]
2   2    [0, 2]       z    [1, 2]

However, because the dtype of column D is list, which is not hashable, I can not do it. Could you please show me how to tackle this problem?

CodePudding user response:

Your code works with Python 1.3.4:

df1 = pd.DataFrame({'A': [1, 2, 2], 'B': [['a'], [0, 1], [0, 2]], 'C': ['x', 'y', 'z']})
df2 = pd.DataFrame({'A': [1, 2], 'D': [None, [1, 2]]})

out = pd.merge(df1, df2, how='left', on='A')
print(out)

# Output:
   A       B  C       D
0  1     [a]  x    None
1  2  [0, 1]  y  [1, 2]
2  2  [0, 2]  z  [1, 2]

Update your version of Pandas

CodePudding user response:

From what we can tell, you are incorrect. D doesn't need to be hashable because it's not used to determine any part of the merging. It's just copied along after the merging is done.

>>> df1
   A       B  C
0  1     [a]  x
1  2  [0, 1]  y
2  2  [0, 2]  z

>>> df1.to_numpy()
array([[1, list(['a']), 'x'],
       [2, list([0, 1]), 'y'],
       [2, list([0, 2]), 'z']], dtype=object)

>>> df2
   A       D
0  1    None
1  2  [1, 2]

>>> df2.to_numpy()
array([[1, None],
       [2, list([1, 2])]], dtype=object)

>>> pd.merge(df1, df2, how='left', on='A')
   A       B  C       D
0  1     [a]  x    None
1  2  [0, 1]  y  [1, 2]
2  2  [0, 2]  z  [1, 2]

>>> pd.merge(df1, df2, how='left', on='A').to_numpy()
array([[1, list(['a']), 'x', None],
       [2, list([0, 1]), 'y', list([1, 2])],
       [2, list([0, 2]), 'z', list([1, 2])]], dtype=object)
  • Related