Home > Back-end >  How to expand columns with delimited values in pandas dataframe
How to expand columns with delimited values in pandas dataframe

Time:01-30

I have a dataframe like:

    id diag  a_date
0    1   d1      55
1    1   d2      88
2    2   d5      22
3    2   d3      44
4    2   d4      88
5    2   d4      89
6    3   d1      11
7    3   d1      13
8    3   d1      15
9    3   d5      27
10   3   d5      28
11   3   d5      29

And I want to reshape it so that the diag values become enumerated columns with a_date values like:

  d1_1 d1_2 d1_3 d2_1 d3_1 d4_1 d4_2 d5_1 d5_2 d5_3
0   55             88                              
1                       44   88   89   22          
2   11   13   15                       27   28   29

CodePudding user response:

Here is a posible solution.

df = df.sort_values(by=["id", "diag", "a_date"])
df["enumerate"] = df.groupby(["id", "diag"]).cumcount()   1
df = df.pivot_table(index="id", columns=["diag", "enumerate"], values="a_date").reset_index()

df.columns = [f"{col[0]}_{col[1]}" for col in df.columns]
df = df.drop(columns=["id_"]).fillna("")
print(df)

  d1_1 d1_2 d1_3 d2_1 d3_1 d4_1 d4_2 d5_1 d5_2 d5_3
0   55             88                              
1                       44   88   89   22          
2   11   13   15                       27   28   29

CodePudding user response:

delimiter = '|'
_df = df.pivot_table(index=['id'], columns='diag', values=['a_date'], aggfunc=(lambda x: delimiter.join(str(v) for v in x)))
_df = _df.droplevel(axis=1, level=0)
_df.columns.name = None
_df = _df.reset_index()
#    id        d1   d2   d3     d4        d5
# 0   1        55   88  NaN    NaN       NaN
# 1   2       NaN  NaN   44  88|89        22
# 2   3  11|13|15  NaN  NaN    NaN  27|28|29

cols = df.diag.unique().tolist()
dfs = [_df.id]
for col in cols:
    _dfX = _df[col].str.split(delimiter, expand=True)
    _dfX.columns = [f'{col}_{c 1}' for c in _dfX.columns]
    dfs.append(_dfX)
df_f = pd.concat(dfs, axis=1).fillna('')
#    id d1_1 d1_2 d1_3 d2_1 d5_1 d5_2 d5_3 d3_1 d4_1 d4_2
# 0   1   55             88                              
# 1   2                       22             44   88   89
# 2   3   11   13   15        27   28   29               

  • Related