I have the following dataframe (column names):
day var1 new_var1 var2 new_var2 new_var3 var3 new_var4 var4 new_var5 var5 new_var6 new_var7 ...
I want to have the following column names:
day var1 new_var1 var2 new_var2 new_var3 var4 new_var4 var5 new_var5 var6 new_var6 new_var7 ...
Column var always follows column new_var and should be higher 1 unit of the previous new_var. Because I have more than 200 columns I cannot simply do:
df=df.rename(columns={"var3": "var4"})
How I should rename my variables?
CodePudding user response:
Here's one way to do it in a panda-esque fashion:
- Prepare sample dataframe
import pandas as pd
columns = ['day',
'var1',
'new_var1',
'var2',
'new_var2',
'new_var3',
'var3',
'new_var4',
'var4',
'new_var5',
'var5',
'new_var6',
'new_var7']
df = pd.DataFrame(data=0, index=[0], columns = columns)
Output:
>>> df
day var1 new_var1 var2 new_var2 new_var3 var3 new_var4 var4 \
0 0 0 0 0 0 0 0 0 0
new_var5 var5 new_var6 new_var7
0 0 0 0 0
- Convert the column index to a series
>>> cols = df.columns.to_series()
>>> cols.head()
day day
var1 var1
new_var1 new_var1
var2 var2
new_var2 new_var2
dtype: object
- Convert the values as per your requirements
# select rows starting with 'var'
m = cols.str.startswith("var")
# create replacement rows
cols[m] = "var" (cols
.shift() # shift by one to get the 'new_var' rows
.replace("day", "new_var0")[m] # handle the 'day' exception, select relevant rows
.str[7:] # remove leading 'new_var'
.astype(int).add(1) # convert to int and add 1
.astype(str)) # convert back to string
cols
now looks like this:
>>> cols
day day
var1 var1
new_var1 new_var1
var2 var2
new_var2 new_var2
new_var3 new_var3
var3 var4
new_var4 new_var4
var4 var5
new_var5 new_var5
var5 var6
new_var6 new_var6
new_var7 new_var7
dtype: object
- Finally, assign the modified column names to
df
df.columns = cols
CodePudding user response:
cols = ['day', 'var1', 'new_var1', 'var2', 'new_var2', 'new_var3', 'var3', 'new_var4', 'var4', 'new_var5', 'var5',
'new_var6', 'new_var7']
pd.options.display.width = None
df = pd.DataFrame([["."] * 13], columns=cols)
print('before\n', df)
for i, c in enumerate(cols):
n = c.split('var')[-1]
n = int(n) if n.isdigit() else 0
if 'new_' not in c and n > 2:
cols[i] = f'var{n 1}'
df.columns = cols
print('after\n', df)
before
day var1 new_var1 var2 new_var2 new_var3 var3 new_var4 var4 new_var5 var5 new_var6 new_var7
0 . . . . . . . . . . . . .
after
day var1 new_var1 var2 new_var2 new_var3 var4 new_var4 var5 new_var5 var6 new_var6 new_var7
0 . . . . . . . . . . . . .