Rename columns name that change over time with pandas-CodePudding

I have a file to process but the columns name and the order change over time. I want to rename the columns name in a manner that i will able to work even that they are a change in the file.

for exp: sometimes is

exp 1:

postcode brand
abde     exp1
enter code here

exp 2

PostCode brands
abdx     exp2

exp 3

brands postcode
abde     exp1

The expcted output

PostCode brand
abde     exp1

NB: i cannot use index because the order changes over time

CodePudding user response：

You could standardize the column names for your DataFrame as follows:

>>> df.rename(columns={c: "PostCode" if "postcode" in c.lower() else "Brand" for c in df.columns})

CodePudding user response：

There is no definitive answer to this question, everything depends on the variability of your header…

Let's imagine that case: order and plural are the only variations. You could map a cleaning function to your column names, and sort the columns:

def clean_name(s):
    # make lowercase
    s = s.lower()
    # remove trailing 's'
    s = s.rstrip('s')
    return s

df.columns = df.columns.map(clean_name)
df = df.sort_index(axis=1)

Example input:

  PostCode brands
0     abde   exp1

output:

  brand postcode
0  exp1     abde

CodePudding user response：

I'd create a column mapper dictionary that you can add to over time:

col_map = {
  "postcode": "PostCode",
  "brands": "brand",
}

col_order = ["PostCode", "brand"]

renamed_df = df.columns.map(lambda x: col_map.get(x, x)) # <- Renames the cols to the dict values
ouput = renamed_df.reindex(columns=col_order ) # <- reorders the cols based on the config list

NOTE col_map.get(x, x) will fallback to the provided col if it is new, ie. "Brands"

If instead you want it to error so you can easily identify the issue and update the col_map you could use df.columns.map(col_map)