Home > Blockchain >  Create DataFrame columns based on another columns (faster solution)
Create DataFrame columns based on another columns (faster solution)

Time:07-01

I have a DataFrame with 1mln of rows and two columns Type and Name whose values are a lists with non-unique values. Both Type and Name columns have the same number of elements because they form a pair (Type, Name). I would like to add to my DataFrame columns whose names are the unique types from Type column with the values being a list of corresponding values from Name columns. Below is a short example of the current code. It works but very slow when the number of rows is 1mln so I'm looking for a faster solution.

import pandas as pd
df = pd.DataFrame({"Type": [["1", "1", "2", "3"], ["2","3"]], "Name": [["A", "B", "C", "D"], ["E", "F"]]})

unique = list(set(df["Type"].explode()))
for t in unique:
    df[t] = None
    df[t] = df[t].astype('object')

for idx, row in df.iterrows():
    for t in unique:
        df.at[idx, t] = [row["Name"][i] for i in range(len(row["Name"])) if row["Type"][i] == t]

My desired result is: enter image description here

CodePudding user response:

You can explode the whole dataframe and then use your exploded dataframe and pivot it with a list as the aggfunc (Resetting the index to use the index as the grouper for the pivot)

df.explode(column=['Type','Name']).reset_index().pivot_table(index='index',columns='Type', values='Name',aggfunc=list)

Type    1       2   3
index           
0       [A, B]  [C] [D]
1       NaN     [E] [F]

And then concat it back onto the original

pd.concat([df,df.explode(column=['Type','Name']).reset_index().pivot_table(index='index',columns='Type', values='Name',aggfunc=list)],axis=1)

    Type            Name            1       2   3
0   [1, 1, 2, 3]    [A, B, C, D]    [A, B]  [C] [D]
1   [2, 3]          [E, F]          NaN     [E] [F]

As requested, here is the code broken out by step for debugging purposes

df1=df.explode(column=['Type','Name'])
df1=df1.reset_index()
pvt=df1.pivot_table(index='index',columns='Type', values='Name',aggfunc=list)
pd.concat([df,pvt],axis=1)

CodePudding user response:

Alternative solution:

df = pd.DataFrame({"Type": [["1", "1", "2", "3"], ["2","3"]], "Name": [["A", "B", "C", "D"], ["E", "F"]]})
df_conc = pd.concat([df, df.apply(pd.Series.explode).reset_index().groupby(['index', 'Type']).agg(list).unstack().droplevel(level=0, axis=1).fillna("").apply(list)], axis=1)
df_conc

----------------------------------------------
    Type           Name         1       2   3
0   [1, 1, 2, 3]   [A, B, C, D] [A, B]  [C] [D]
1   [2, 3]         [E, F]               [E] [F]
----------------------------------------------

If nan values are accepted, just remove .fillna("").apply(list):

df_conc = pd.concat([df, df.apply(pd.Series.explode).reset_index().groupby(['index', 'Type']).agg(list).unstack().droplevel(level=0, axis=1)], axis=1)
df_conc

----------------------------------------------
    Type           Name         1       2   3
0   [1, 1, 2, 3]   [A, B, C, D] [A, B]  [C] [D]
1   [2, 3]         [E, F]       NaN     [E] [F]
----------------------------------------------
  • Related