Home > Net >  How to remove duplicate columns as rows for pandas df
How to remove duplicate columns as rows for pandas df

Time:07-28

Python newbie to permutate duplicate columns

I have a dataframe like this.

a b c c_dup1 c_dup2 c_dup3 d d_dup1 d_dup2 d_dup3
hello bye 1 2 3 4 5 6 7 8

I've only managed to do this for where only c_dup1 exist with the below logic.

data1 = {'a':'hello', 'b':'bye', 'c':1,'c_dup1':2}
df1 = pd.DataFrame(data1, index=[0])


for col in df1.columns:
    x_df = df1
    if '_dup' in col:
        match_col = re.compile(r'(.*?)_')
        col_name = match_col.findall(col)[0]
        # Drop duplicate column from original
        df1 = df1.drop(col, axis=1)
        # Drop original column from duplicate
        df_pdup = x_df.drop(col_name, axis=1)
        df_pdup = df_pdup.rename(columns={col:col_name})    
    final = pd.concat([df1, df_pdup], ignore_index = True)

Output is :

a b c
hello bye 1
hello bye 2

How do I replicate the rows to get the below output? For my initial dataframe above?

a b c d
hello bye 1 5
hello bye 2 6
hello bye 3 7
hello bye 4 8

CodePudding user response:

IIUC, you can use a manual reshaping with a MultiIndex:

cols = ['a', 'b']

out = (df
 .set_index(cols)
 .pipe(lambda d: d.set_axis(d.columns.str.split('_dup', expand=True), axis=1))
 .stack()
 .droplevel(-1).reset_index()
)

output:

       a    b  c  d
0  hello  bye  1  5
1  hello  bye  2  6
2  hello  bye  3  7
3  hello  bye  4  8

used input:

       a    b  c  c_dup1  c_dup2  c_dup3  d  d_dup1  d_dup2  d_dup3
0  hello  bye  1       2       3       4  5       6       7       8

For a programmatic way of getting a/b as the only columns that do not have an equivalent with '_dup', you can use:

import re
target = df.columns.str.extract('(.*)_dup', expand=False).dropna().unique()
# Index(['c', 'd'], dtype='object')

regex = fr"^({'|'.join(map(re.escape, target))})"
# ^(c|d)

cols = list(df.columns[~df.columns.str.contains(regex)])
# ['a', 'b']

NB. there might be limitations if there are overlapping suffixes (e.g. ABC/ABCD)

  • Related