Merge two pandas dataframes, as lists in every cell-CodePudding

I want to merge 2 dataframes, with the resulting dataframe having a list in every single cell. I'm completely lost on how to do this.

My current solution is using the index of each dataframe to build a dict (eg. dict[index[0]]['DEPTH'] = []), and then looping over rows of the dataframes to append to dict keys (eg. dict[index[0]]['DEPTH'].append(cell_value)), but I'm thinking that's super inefficient and slow.

Does a pandas solution exist that would get this done?

df1 would look like this:

df2 would look like this:

Resulting df would look something like this:

                    DEPTH        A
chr1~10007022~C    [1, 1]      [0, 0]
chr1~10007023~T    [1, 1]      [0, 0]
                  .
                  .
                  .
chr1~10076693~T    [1, 1]      [0, 0]

Keep in mind:

indexes of dataframe would probably differ, but not always.
dataframes will probably contain >100M rows each

CodePudding user response：

You could concatenate the two, groupby the item and then agg with list.

import pandas as pd

df = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007023-T'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})

df = df.set_index('item')

df2 = pd.DataFrame({'item':['chr1-10007022-C', 'chr1-10007026-X'],
                  'DEPTH':[1,1],
                  'A':[0,0],
                  'C':[0,0]})
df2 = df2.set_index('item')

out = pd.concat([df,df2]).groupby(level=0).agg(list)

Output

                  DEPTH       A       C
item                                   
chr1-10007022-C  [1, 1]  [0, 0]  [0, 0]
chr1-10007023-T     [1]     [0]     [0]
chr1-10007026-X     [1]     [0]     [0]