Home > other >  Change the name of columns based on a period in df
Change the name of columns based on a period in df

Time:12-07

I have a df and I want to change the name of column based on a period. For example, in the following df, I have 15 columns with name v0-14. I want to rename it to v0-v2, and after three columns, again I want to have v0-v2. Since, it seems that we can not have the repetitive names, I change the second group to v10-v12, and third group to v20-v22 and etc.

df = pd.DataFrame()
df['id'] = [1]
df['v0'] = [2]
df['v1'] = [1]
df['v2'] = [2]
df['v3'] = [1]
df['v4'] = [2]
df['v5'] = [1]
df['v6'] = [2]
df['v7'] = [1]
df['v8'] = [2]
df['v9'] = [1]
df['v10'] = [2]
df['v11'] = [1]
df['v12'] = [2]
df['v13'] = [1]
df['v14'] = [2]
df

And here is the output I want. Thank you in advance

   id   v00 v01 v02 v10 v11 v12 v20 v21 v22 v30 v31 v32 v40 v41 v42
0   1   2   1    2   1   2   1   2   1   2   1   2   1   2   1   2

CodePudding user response:

You can actually have duplicated names:

import numpy as np
df.columns = np.hstack(['id', np.tile(['v0', 'v1', 'v2'], (len(df.columns)-1)//3)])

print(df)

Output:

   id  v0  v1  v2  v0  v1  v2  v0  v1  v2  v0  v1  v2  v0  v1  v2
0   1   2   1   2   1   2   1   2   1   2   1   2   1   2   1   2

For your alternative:

a = np.arange(df.shape[1]-1)
new = np.core.defchararray.add('v', (a%3 a//3*10).astype(str))
df.columns = np.hstack(['id', new])

print(df)

Output:

   id  v0  v1  v2  v10  v11  v12  v20  v21  v22  v30  v31  v32  v40  v41  v42
0   1   2   1   2    1    2    1    2    1    2    1    2    1    2    1    2

alternative with str.replace:

df.columns = df.columns.str.replace('(\d )',
                                    lambda m: str((x:=int(m.group()))%3 (x//3*10)),
                                    regex=True)

CodePudding user response:

You can use:

df.rename(columns={f"v{i}": f"v{int(i/3)}" f"{int(i%3)}" for i in range(len(df.columns))}, inplace=True)

  • Related