Suppose I have a pandas dataframe that looks like this:
Name Values
0 John [[15, 200], [20, 300]]
1 Sam [[16, 100], [20, 1000]]
Where the column 'Values' represents a distribution of wealth over time. For example, when John was 15 he had $200 and by the age of 20 he had $300.
I want my dataframe to look like this:
Name Age Wealth
0 John 15 200
1 John 20 300
2 Sam 16 100
3 Sam 20 1000
so that I can make use of the inbuilt functions of pandas to study my data.
I could probably create a new dataframe in my liking by iterating over each pair of values in the original 'Values' column, but that is probably not very efficient since I have a lot of data, and I wouldn't be taking advantage of pandas's functionality.
Is there a way to do this without manually creating a new dataframe from scratch? Does the new dataframe require additional space or is this negligible?
CodePudding user response:
Use df.explode
with pd.concat
:
In [253]: df = df.explode('Values')
In [261]: res = pd.concat([df['Name'], pd.DataFrame(df['Values'].tolist(), columns=['Age', 'Wealth'], index=df.index)], axis=1)
In [262]: res
Out[262]:
Name Age Wealth
0 John 15 200
0 John 20 300
1 Sam 16 100
1 Sam 20 1000
CodePudding user response:
try this:
df1 = df.explode('Values')
df1[['age', 'wealth']] = pd.DataFrame(df1.Values.tolist(), columns=['age', 'wealth'])
df1.drop(columns='Values')
> Name age wealth
0 John 15 200
0 John 15 200
1 Sam 20 300
1 Sam 20 300
CodePudding user response:
import pandas as pd
df = pd.DataFrame({
"Name": ["John", "Sam"],
"Values": [[[15, 200], [20, 300]], [[16, 100], [20, 1000]]]
})
df = df.Values.apply(pd.Series).T.melt().dropna()
df = (df.value
.apply(pd.Series)
.set_index(df.variable)
.rename(columns={0:'Age', 1:'Values'})
)
print(df)
Output
variable Age Values
0 15 200
0 20 300
1 16 100
1 20 1000
CodePudding user response:
You can use pandas.explode
on values
then use pandas.join
.
df = df.explode('values')
df = df.join(pd.DataFrame(df['values'].to_list(),
columns=['age','wealth'])
).drop('values',axis=1).reset_index(drop=True)
print(df)
Output:
name age wealth
0 John 15 200
1 John 15 200
2 Sam 20 300
3 Sam 20 300
CodePudding user response:
With df
your dataframe you could try:
df = (
df
.explode("Values")
.assign(
Age=lambda df: df["Values"].str[0],
Wealth=lambda df: df["Values"].str[1]
)
.drop(columns="Values")
.reset_index(drop=True)
)
Result:
Name Age Wealth
0 John 15 200
1 John 20 300
2 Sam 16 100
3 Sam 20 1000