I have a file to process but the columns name and the order change over time. I want to rename the columns name in a manner that i will able to work even that they are a change in the file.
for exp: sometimes is
exp 1:
postcode brand
abde exp1
enter code here
exp 2
PostCode brands
abdx exp2
exp 3
brands postcode
abde exp1
The expcted output
PostCode brand
abde exp1
NB: i cannot use index because the order changes over time
CodePudding user response:
You could standardize the column names for your DataFrame as follows:
>>> df.rename(columns={c: "PostCode" if "postcode" in c.lower() else "Brand" for c in df.columns})
CodePudding user response:
There is no definitive answer to this question, everything depends on the variability of your header…
Let's imagine that case: order and plural are the only variations. You could map
a cleaning function to your column names, and sort the columns:
def clean_name(s):
# make lowercase
s = s.lower()
# remove trailing 's'
s = s.rstrip('s')
return s
df.columns = df.columns.map(clean_name)
df = df.sort_index(axis=1)
Example input:
PostCode brands
0 abde exp1
output:
brand postcode
0 exp1 abde
CodePudding user response:
I'd create a column mapper dictionary that you can add to over time:
col_map = {
"postcode": "PostCode",
"brands": "brand",
}
col_order = ["PostCode", "brand"]
renamed_df = df.columns.map(lambda x: col_map.get(x, x)) # <- Renames the cols to the dict values
ouput = renamed_df.reindex(columns=col_order ) # <- reorders the cols based on the config list
NOTE
col_map.get(x, x)
will fallback to the provided col if it is new, ie. "Brands"
If instead you want it to error so you can easily identify the issue and update the col_map
you could use df.columns.map(col_map)