How to Merge Multiple Panda's DataFrames into an Array for each Column Value Based on Another C-CodePudding

I have several Panda's Dataframes that I would like to merge together. When I merge them I would like the values that have the same columns to become an array of values.

For example, I would like to merge two data frames together if they have the same value in a specified column. When they are merged the data becomes an array of values.

  df1 = 
        A   Value
    0   x   0
    1   y   0


  df2 = 
        A   Value
    0   x   1
    1   y   1
    2   z   1


  After Combining:
  df =
        A   Number_Value 
    0   x   [0, 1]       
    1   y   [0, 1]       
    2   z   [, 1]

I do not believe the merge() or concat() call would be appropriate. I thought calling .to_numpy() would be able to do this, if I were to convert each value in each row to an array, but that does not seem to work.

CodePudding user response：

Use concat with aggregate list:

df = pd.concat([df1, df2]).groupby('A', as_index=False).agg(list)
print (df)
   A   Value
0  x  [0, 1]
1  y  [0, 1]
2  z     [1]

Test DataFrames without A column:

L = [df1, df2]
print ([x for x in L if 'A' not in x.columns])

EDIT: For add '' for empty values add it to fill_value parameter:

L = [df1, df2]

df = pd.concat(L, keys=range(len(L))).reset_index(level=1, drop=True).set_index('A', append=True)
mux = pd.MultiIndex.from_product(df.index.levels)
df = df.reindex(mux, fill_value='').groupby('A').agg(list).reset_index()

print (df)

   A   Value
0  x  [0, 1]
1  y  [0, 1]
2  z   [, 1]