Home > Enterprise >  How to insert new columns to original pandas data frame after "apply(pd.Series)"
How to insert new columns to original pandas data frame after "apply(pd.Series)"

Time:09-23

I have pandas data frame with plethora of columns. one column with dictionary values.
I can "explode" it with ".map(eval)apply(pd.Series)".
I need these resulted columns to be inserted to original df.
I do not know how to do it. At all

result = df['dic_column'].map(eval).apply(pd.Series)

result
    A   B   C   D   E
1   0   0   0   1   0
2   1   9   0   9   0
3   0   0   0   1   0
4   1   9   0   9   0
5   0   0   0   2   

Wanted outcome:

df    
user_id    og_column1    og_column2    A    B    C    D    E
1          valuey        valuey        0    0    0    1    0
2          valuex        valuex        1    9    0    9    0
...

EDIT:
Solution: join back.

result = df.join(df['dic_column'].map(eval).apply(pd.Series))

CodePudding user response:

You can assign multiple columns to a DataFrame:

df[result.columns] = result

Or you can use DataFrame.join:

df = df.join(result)

Technically you can also use concat:

df = pd.concat((df, result), axis='columns')

All of the above are very similar operations.

They all perform a join (in the relational algebra sense) on the row labels of the data frames.

In Pandas terminology, the row labels are the "index" of a data frame. By default, if you didn't explicitly create or assign an index, the row labels are just a range of integers, corresponding to row numbers. The difference between the row numbers and row labels is that the labels will be preserved across most Pandas operations, while the row numbers are just the row numbers.

So if you shuffle a data frame, the indexes will be shuffled as well. Among other things, this feature allows you to re-join data to its source even after some fairly complicated data manipulation.

The official Pandas documentation doesn't have a single coherent resource for understanding the "index" data model. However, found this blog post and it seems to cover most of what you need to know.

CodePudding user response:

Henry Ecker gave solution in comments

We can join back the results.

result = df.join(df['dic_column'].map(eval).apply(pd.Series))
  • Related