I saw some similar questions but not quite the one I was looking for.
I have a excel sheet that is delivered to me:
- The columns I need are there
- the column names are on the second row
- but the order of the columns may vary (sometimes the number comes first, or the city and names are in different order)
- there are 2 column with the same name.
This is how to replicate :
data = [['name','name','number'], ['Nick','Brussels', 15], ['Tom','Paris', 14]]
df = pd.DataFrame(data)
I make my first row the column headers like this :
df=df.rename(columns=df.iloc[0]).drop(df.index[0])
What I know for sure is that the value 'Brussels' is always in the dataset, so my question is : can I change the name of the column based on a value in that column.
so in pseudo code this would be :
if the column contains the word 'Brussels' rename the column to 'city'
All my attempts so far change both column names, because it returns 'name' as the index to change when I select the column containing Brussels. I would like to have the iloc, returned...
My goal is to have this
name city number
1 Nick Brussels 15
2 Tom Paris 14
Thanks !
CodePudding user response:
You can use a boolean mask to find the right column:
df.columns = [c if not m else 'city'
for c, m in zip(df.columns, df.eq('Brussels').any())]
print(df)
# Output
name city number
1 Nick Brussels 15
2 Tom Paris 14
CodePudding user response:
This is similar to Corralien's answer in using a boolean mask, but instead of a list comprehension, it modifies the source row first and then assigns it as columns like you did.
df.iloc[0][df.eq('Brussels').any()] = 'city'
df = df.rename(columns=df.iloc[0]).drop(df.index[0])