I have some columns that have the same names. I would like to add a 1 to the repeating column names
Data
Date Type hi hello stat hi hello
1/1/2022 a 0 0 1 1 0
Desired
Date Type hi hello stat hi1 hello1
1/1/2022 a 0 0 1 1 0
Doing
mask = df['col2'].duplicated(keep=False)
I believe I can utilize mask, but not sure how to efficiently achieve this without calling out the actual column. I would like to call the full dataset and allow the algorithm to update the dupe.
Any suggestion is appreciated
CodePudding user response:
Use the built-in method _maybe_dedup_names()
to deduplicate column headers:
from pandas.io.parsers.base_parser import ParserBase
df.columns = ParserBase({'usecols': None})._maybe_dedup_names(df.columns)
# Date Type hi hello stat hi.1 hello.1
# 0 1/1/2022 a 0 0 1 1 0
This is what pandas uses internally to fix duplicate headers from read_csv()
.
CodePudding user response:
You need to apply the duplicated operation to the column names. And then map the duplication information to a string, which you can then add to the original column names.
df.columns = df.columns [{False:'',True:'1'}[x] for x in df.columns.duplicated()]
CodePudding user response:
We can do
s = df.columns.to_series().groupby(df.columns).cumcount().replace({0:''}).astype(str).radd('.')
df.columns = (df.columns s).str.strip('.')
df
Out[153]:
Date Type hi hello stat hi.1 hello.1
0 1/1/2022 a 0 0 1 1 0