Home > Software engineering >  Rename a column based on the content of it
Rename a column based on the content of it

Time:07-08

I saw some similar questions but not quite the one I was looking for.

I have a excel sheet that is delivered to me:

  • The columns I need are there
  • the column names are on the second row
  • but the order of the columns may vary (sometimes the number comes first, or the city and names are in different order)
  • there are 2 column with the same name.

This is how to replicate :

data = [['name','name','number'], ['Nick','Brussels', 15], ['Tom','Paris', 14]]
df = pd.DataFrame(data)

I make my first row the column headers like this :

df=df.rename(columns=df.iloc[0]).drop(df.index[0])

What I know for sure is that the value 'Brussels' is always in the dataset, so my question is : can I change the name of the column based on a value in that column.

so in pseudo code this would be :

if the column contains the word 'Brussels' rename the column to 'city' 

All my attempts so far change both column names, because it returns 'name' as the index to change when I select the column containing Brussels. I would like to have the iloc, returned...

My goal is to have this

   name      city number
1  Nick  Brussels     15
2   Tom     Paris     14

Thanks !

CodePudding user response:

You can use a boolean mask to find the right column:

df.columns = [c if not m else 'city'
                  for c, m in zip(df.columns, df.eq('Brussels').any())]
print(df)

# Output
   name      city number
1  Nick  Brussels     15
2   Tom     Paris     14

CodePudding user response:

This is similar to Corralien's answer in using a boolean mask, but instead of a list comprehension, it modifies the source row first and then assigns it as columns like you did.

df.iloc[0][df.eq('Brussels').any()] = 'city'
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
  • Related