I am making some script, wherein my requirement is to remove some special characters from column names and to make data frame as multi header. code
import pandas as pd
import numpy as np
cars = {'day':['aug','aug','sep','sep','aug'],
'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4','Hyundai Elite i20'],
'Type':['sedan,','sedan','hatchback','hatchback','hatchback'],
'Down Price': [22000,25000,27000,35000,10000]
}
df = pd.DataFrame(cars, columns = ['day','Brand', 'Type','Down Price'])
dfpivot=pd.pivot_table(df,index=['day'],columns=['Brand','Type'],values=['Down Price'],aggfunc=np.max)
dfpivot=pd.DataFrame(dfpivot.to_records())
Post this my requirement is to replace "(" & ")" from column name & to split column name with "," to make it multi header What i tried :
dfpivot.columns=dfpivot.columns.str.replace("(","").replace(")","")
dfpivot.columns = dfpivot.columns.str.split(',', expand=True)
I am getting the below error, I also tried to check similar questions posted, but it is not helping, will be grateful if someone can help to resolve.
AttributeError Traceback (most recent call last) in 10 dfpivot=pd.pivot_table(df,index=['day'],columns=['Brand','Type'],values=['Down Price'],aggfunc=np.max) 11 dfpivot=pd.DataFrame(dfpivot.to_records()) ---> 12 dfpivot.columns=dfpivot.columns.str.replace("(","").replace(")","") 13 dfpivot
AttributeError: 'Index' object has no attribute 'replace'
CodePudding user response:
Replace all except alpha numerics, white space and comma, split and expand in the columns.
dfpivot.columns=dfpivot.columns.str.replace('[^\w\,\s]','',regex=True).str.split(',', expand=True)
CodePudding user response:
Pandas replace
returns an Index
, so you need to use another str
on it:
dfpivot.columns=dfpivot.columns.str.replace("(","").str.replace(")","")
CodePudding user response:
The error is because the second replace
is missing its str
accessor:
dfpivot.columns = dfpivot.columns.str.replace("(", "").str.replace(")", "")
# ^^^
However it's simpler to use str.strip
for this:
dfpivot.columns = dfpivot.columns.str.strip("()").str.split(",", expand=True)
# day 'Down Price'
# NaN 'Audi A4' 'Ford Focus' 'Honda Civic' 'Hyundai Elite i20' 'Toyota Corolla'
# NaN 'hatchback' 'hatchback' 'sedan' 'hatchback' 'sedan'
# 0 aug NaN NaN 22000.0 10000.0 25000.0
# 1 sep 35000.0 27000.0 NaN NaN NaN
But note that this will leave single quotes in the column names. If you don't want those quotes, instead use str.replace
with regex:
dfpivot.columns = dfpivot.columns.str.replace(r"[()']", "", regex=True).str.split(",", expand=True)
# day Down Price
# NaN Audi A4 Ford Focus Honda Civic Hyundai Elite i20 Toyota Corolla
# NaN hatchback hatchback sedan hatchback sedan
# 0 aug NaN NaN 22000.0 10000.0 25000.0
# 1 sep 35000.0 27000.0 NaN NaN NaN