I have pandas data frame with plethora of columns. one column with dictionary values.
I can "explode" it with ".map(eval)apply(pd.Series)".
I need these resulted columns to be inserted to original df.
I do not know how to do it. At all
result = df['dic_column'].map(eval).apply(pd.Series)
result
A B C D E
1 0 0 0 1 0
2 1 9 0 9 0
3 0 0 0 1 0
4 1 9 0 9 0
5 0 0 0 2
Wanted outcome:
df
user_id og_column1 og_column2 A B C D E
1 valuey valuey 0 0 0 1 0
2 valuex valuex 1 9 0 9 0
...
EDIT:
Solution: join back.
result = df.join(df['dic_column'].map(eval).apply(pd.Series))
CodePudding user response:
You can assign multiple columns to a DataFrame:
df[result.columns] = result
Or you can use DataFrame.join
:
df = df.join(result)
Technically you can also use concat
:
df = pd.concat((df, result), axis='columns')
All of the above are very similar operations.
They all perform a join (in the relational algebra sense) on the row labels of the data frames.
In Pandas terminology, the row labels are the "index" of a data frame. By default, if you didn't explicitly create or assign an index, the row labels are just a range of integers, corresponding to row numbers. The difference between the row numbers and row labels is that the labels will be preserved across most Pandas operations, while the row numbers are just the row numbers.
So if you shuffle a data frame, the indexes will be shuffled as well. Among other things, this feature allows you to re-join data to its source even after some fairly complicated data manipulation.
The official Pandas documentation doesn't have a single coherent resource for understanding the "index" data model. However, found this blog post and it seems to cover most of what you need to know.
CodePudding user response:
Henry Ecker gave solution in comments
We can join back the results.
result = df.join(df['dic_column'].map(eval).apply(pd.Series))