Home > Software engineering >  Python rename columns (more difficult than usual)
Python rename columns (more difficult than usual)

Time:09-18

I have the following dataframe (column names):

day var1 new_var1 var2 new_var2 new_var3 var3 new_var4 var4 new_var5 var5 new_var6 new_var7 ...

I want to have the following column names:

day var1 new_var1 var2 new_var2 new_var3 var4 new_var4 var5 new_var5 var6 new_var6 new_var7 ...

Column var always follows column new_var and should be higher 1 unit of the previous new_var. Because I have more than 200 columns I cannot simply do:

df=df.rename(columns={"var3": "var4"})

How I should rename my variables?

CodePudding user response:

Here's one way to do it in a panda-esque fashion:

  1. Prepare sample dataframe
import pandas as pd

columns = ['day',
           'var1',
           'new_var1',
           'var2',
           'new_var2',
           'new_var3',
           'var3',
           'new_var4',
           'var4',
           'new_var5',
           'var5',
           'new_var6',
           'new_var7']

df = pd.DataFrame(data=0, index=[0], columns = columns)

Output:

>>> df
   day  var1  new_var1  var2  new_var2  new_var3  var3  new_var4  var4  \
0    0     0         0     0         0         0     0         0     0   

   new_var5  var5  new_var6  new_var7  
0         0     0         0         0 
  1. Convert the column index to a series
>>> cols = df.columns.to_series()
>>> cols.head()
day              day
var1            var1
new_var1    new_var1
var2            var2
new_var2    new_var2
dtype: object
  1. Convert the values as per your requirements
# select rows starting with 'var'
m = cols.str.startswith("var")

# create replacement rows
cols[m] = "var"   (cols
                   .shift()  # shift by one to get the 'new_var' rows
                   .replace("day", "new_var0")[m]  # handle the 'day' exception, select relevant rows
                   .str[7:]  # remove leading 'new_var'
                   .astype(int).add(1)  # convert to int and add 1
                   .astype(str))  # convert back to string

cols now looks like this:

>>> cols
day              day
var1            var1
new_var1    new_var1
var2            var2
new_var2    new_var2
new_var3    new_var3
var3            var4
new_var4    new_var4
var4            var5
new_var5    new_var5
var5            var6
new_var6    new_var6
new_var7    new_var7
dtype: object
  1. Finally, assign the modified column names to df
df.columns = cols

CodePudding user response:

cols = ['day', 'var1', 'new_var1', 'var2', 'new_var2', 'new_var3', 'var3', 'new_var4', 'var4', 'new_var5', 'var5',
        'new_var6', 'new_var7']

pd.options.display.width = None
df = pd.DataFrame([["."] * 13], columns=cols)
print('before\n', df)

for i, c in enumerate(cols):
    n = c.split('var')[-1]
    n = int(n) if n.isdigit() else 0
    if 'new_' not in c and n > 2:
        cols[i] = f'var{n   1}'
df.columns = cols
print('after\n', df)
before
   day var1 new_var1 var2 new_var2 new_var3 var3 new_var4 var4 new_var5 var5 new_var6 new_var7
0   .    .        .    .        .        .    .        .    .        .    .        .        .
after
   day var1 new_var1 var2 new_var2 new_var3 var4 new_var4 var5 new_var5 var6 new_var6 new_var7
0   .    .        .    .        .        .    .        .    .        .    .        .        .
  • Related