Home > Enterprise >  Rename duplicate column name by order in Pandas
Rename duplicate column name by order in Pandas

Time:10-23

I have a dataframe, df, where I would like to rename two duplicate columns in consecutive order:

Data

DD  Nice Nice Hello
0   1    1    2

Desired

DD  Nice1 Nice2 Hello
0   1     1     2

Doing

df.rename(columns={"Name": "Name1", "Name": "Name2"})

I am running the rename function, however, because both column names are identical, the results are not desirable.

CodePudding user response:

Here's an approach with groupby:

s = df.columns.to_series().groupby(df.columns)


df.columns = np.where(s.transform('size')>1, 
                      df.columns   s.cumcount().add(1).astype(str), 
                      df.columns)

Output:

   DD  Nice1  Nice2  Hello
0   0      1      1      2

CodePudding user response:

This is how you do it. e.g.:

df.rename(columns={ df.columns[1]: "Name1" }, inplace = True)

CodePudding user response:

You can use:

cols = pd.Series(df.columns)
dup_count = cols.value_counts()
for dup in cols[cols.duplicated()].unique():
    cols[cols[cols == dup].index.values.tolist()] = [dup   str(i) for i in range(1, dup_count[dup] 1)]

df.columns = cols

Input:

col_1  Nice  Nice  Nice  Hello  Hello  Hello
col_2     1     2     3      4      5      6

Output:

col_1  Nice1  Nice2  Nice3  Hello1  Hello2  Hello3
col_2      1      2      3       4       5       6

Setup to generate duplicate cols:

df = pd.DataFrame(data={'col_1':['Nice', 'Nice', 'Nice', 'Hello', 'Hello', 'Hello'], 'col_2':[1,2,3,4, 5, 6]})
df = df.set_index('col_1').T

CodePudding user response:

You could use an itertools.count() counter and a list expression to create new column headers, then assign them to the data frame.

For example:

>>> import itertools
>>> df = pd.DataFrame([[1, 2, 3]], columns=["Nice", "Nice", "Hello"])
>>> df
   Nice  Nice  Hello
0     1     2      3
>>> count = itertools.count(1)
>>> new_cols = [f"Nice{next(count)}" if col == "Nice" else col for col in df.columns]
>>> df.columns = new_cols
>>> df
   Nice1  Nice2  Hello
0      1      2      3

(Python 3.6 required for the f-strings)

EDIT: Alternatively, per the comment below, the list expression can replace any label that may contain "Nice" in case there are unexpected spaces or other characters:

new_cols = [f"Nice{next(count)}" if "Nice" in col else col for col in df.columns]
  • Related