Home > OS >  Combine list with dataframe and extend the multi-values cells into columns
Combine list with dataframe and extend the multi-values cells into columns

Time:09-18

Assuming I have the following input:

table = pd.DataFrame({'a':[0,0,0,0],'b':[1,1,1,3,],'c':[2,2,5,4],'d':[3,6,6,6]},dtype='float64')

list = [[55,66],
        [77]]

#output of the table
         a    b    c    d
    0  0.0  1.0  2.0  3.0
    1  0.0  1.0  2.0  6.0
    2  0.0  1.0  5.0  6.0
    3  0.0  3.0  4.0  6.0

I want to combine list with table so the final shape would be like:

     a    b    c    d  ID_0   ID_1
0  0.0  1.0  2.0  3.0  55.0   66.0
1  0.0  1.0  2.0  6.0  77.0    NaN
2  0.0  1.0  5.0  6.0   NaN    NaN
3  0.0  3.0  4.0  6.0   NaN    NaN

I found a way but it looks a bit long and might be a shorter way to do it.

Step1:

x = pd.Series(list, name ="ID")
new = pd.concat([table, x], axis=1)

# output
     a    b    c    d      ID
0  0.0  1.0  2.0  3.0  [5, 6]
1  0.0  1.0  2.0  6.0    [77]
2  0.0  1.0  5.0  6.0     NaN
3  0.0  3.0  4.0  6.0     NaN

step2:

ID = new['ID'].apply(pd.Series)
ID = ID.rename(columns = lambda x : 'ID_'   str(x))
new_x = pd.concat([new[:], ID[:]], axis=1)

# output
     a    b    c    d      ID  ID_0  ID_1
0  0.0  1.0  2.0  3.0  [5, 6]   5.0   6.0
1  0.0  1.0  2.0  6.0    [77]  77.0   NaN
2  0.0  1.0  5.0  6.0     NaN   NaN   NaN
3  0.0  3.0  4.0  6.0     NaN   NaN   NaN

step3:

new_x = new_x.drop(columns=['ID'], axis = 1)

Any shorter way to achieve the same result?

CodePudding user response:

Try pd.Series:

table = pd.DataFrame({'a':[0,0,0,0],'b':[1,1,1,3,],'c':[2,2,5,4],'d':[3,6,6,6]},dtype='float64')
list = [[55,66],
        [77]]

table['ID_0'] = pd.Series(list)
print(table)

Result:

     a    b    c    d      ID_0
0  0.0  1.0  2.0  3.0  [55, 66]
1  0.0  1.0  2.0  6.0      [77]
2  0.0  1.0  5.0  6.0       NaN
3  0.0  3.0  4.0  6.0       NaN

CodePudding user response:

Assuming a default index on table (as shown in the question), we can simply create a DataFrame (either from_records or with the constructor) and join back to table and allow the indexes to align. (add_prefix is an easy way to add the 'ID_' prefix to the default numeric columns)

new_df = table.join(
    pd.DataFrame.from_records(lst).add_prefix('ID_')
)

new_df:

     a    b    c    d  ID_0  ID_1
0  0.0  1.0  2.0  3.0  55.0  66.0
1  0.0  1.0  2.0  6.0  77.0   NaN
2  0.0  1.0  5.0  6.0   NaN   NaN
3  0.0  3.0  4.0  6.0   NaN   NaN

Working with 2 DataFrames is generally easier than a DataFrame and a list. Here is what from_records does to lst:

pd.DataFrame.from_records(lst)

    0     1
0  55  66.0
1  77   NaN

Index (rows) 0 and 1 will now align with the corresponding index values in table (0 and 1 respectively).

add_prefix fixes the column names before joining:

pd.DataFrame.from_records(lst).add_prefix('ID_')

   ID_0  ID_1
0    55  66.0
1    77   NaN

Setup and imports:

import pandas as pd  # v1.4.4

table = pd.DataFrame({
    'a': [0, 0, 0, 0],
    'b': [1, 1, 1, 3, ],
    'c': [2, 2, 5, 4],
    'd': [3, 6, 6, 6]
}, dtype='float64')

lst = [[55, 66],
       [77]]
  • Related