Home > front end >  Pandas: column with list of pairs
Pandas: column with list of pairs

Time:07-15

Suppose I have a pandas dataframe that looks like this:

   Name                   Values
0  John   [[15, 200], [20, 300]]
1   Sam  [[16, 100], [20, 1000]]

Where the column 'Values' represents a distribution of wealth over time. For example, when John was 15 he had $200 and by the age of 20 he had $300.

I want my dataframe to look like this:

   Name  Age  Wealth
0  John   15     200
1  John   20     300
2   Sam   16     100
3   Sam   20    1000

so that I can make use of the inbuilt functions of pandas to study my data.

I could probably create a new dataframe in my liking by iterating over each pair of values in the original 'Values' column, but that is probably not very efficient since I have a lot of data, and I wouldn't be taking advantage of pandas's functionality.

Is there a way to do this without manually creating a new dataframe from scratch? Does the new dataframe require additional space or is this negligible?

CodePudding user response:

Use df.explode with pd.concat:

In [253]: df = df.explode('Values')
In [261]: res = pd.concat([df['Name'], pd.DataFrame(df['Values'].tolist(), columns=['Age', 'Wealth'], index=df.index)], axis=1)

In [262]: res
Out[262]: 
   Name  Age  Wealth
0  John   15     200
0  John   20     300
1   Sam   16     100
1   Sam   20    1000

CodePudding user response:

try this:

df1 = df.explode('Values')
df1[['age', 'wealth']] = pd.DataFrame(df1.Values.tolist(), columns=['age', 'wealth'])
df1.drop(columns='Values')
>   Name    age wealth
0   John    15  200
0   John    15  200
1   Sam     20  300
1   Sam     20  300

CodePudding user response:

import pandas as pd


df = pd.DataFrame({
    "Name": ["John", "Sam"],
    "Values": [[[15, 200], [20, 300]], [[16, 100], [20, 1000]]]
})
df = df.Values.apply(pd.Series).T.melt().dropna()
df = (df.value
              .apply(pd.Series)
              .set_index(df.variable)
              .rename(columns={0:'Age', 1:'Values'})
        )
print(df)

    

Output

variable   Age  Values
0          15   200
0          20   300
1          16   100
1          20   1000

CodePudding user response:

You can use pandas.explode on values then use pandas.join.

df = df.explode('values')
df = df.join(pd.DataFrame(df['values'].to_list(), 
                          columns=['age','wealth'])
            ).drop('values',axis=1).reset_index(drop=True)
print(df)

Output:

   name  age  wealth
0  John   15     200
1  John   15     200
2   Sam   20     300
3   Sam   20     300

CodePudding user response:

With df your dataframe you could try:

df = (
    df
    .explode("Values")
    .assign(
        Age=lambda df: df["Values"].str[0],
        Wealth=lambda df: df["Values"].str[1]
    )
    .drop(columns="Values")
    .reset_index(drop=True)
)

Result:

   Name  Age  Wealth
0  John   15     200
1  John   20     300
2   Sam   16     100
3   Sam   20    1000
  • Related