Home > Back-end >  How do I merge columns that have similar names in a pandas dataframe?
How do I merge columns that have similar names in a pandas dataframe?

Time:03-18

I have a dataframe which has the words Due Date written differently but it all means the same. The problem is in my master data(xls file), one due date has an extra space or doesnt and i cant change that.All i can change is my final output.

Sr no Due Date    Due Date   DueDate
1     1/2/22      
2                  1/5/22    
3
4                         
5                             ASAP

I just want that column 2 and 3 all combine under column one at the same location they were

Sr No.  Due Date
1        1/2/22
2        1/5/22
3        
4
5        ASAP

CodePudding user response:

You can use filter with a regex to get similar names, then bfill and get the first. Finally join to original devoid of the found columns:

d = df.filter(regex=r'(?i)due\s*date')
df2 = (df
 .drop(columns=list(d.columns))
 .join(d.bfill(1).iloc[:,0])
 )

Output:

   Sr no Due Date
0      1   1/2/22
1      2   1/5/22
2      3     None
3      4     None
4      5     ASAP

CodePudding user response:

Try with bfill

out = df.bfill(axis = 1)[['Sr No','Due Date']]

CodePudding user response:

Possible solution is the following:

import pandas as pd

# set test data
data = {"Sr no": [1,2,3,4,5],
        "Due Date": ["1/2/22", "", "", "", ""], 
        "Due Date ": ["", "1/2/22", "", "", ""],
        " Due Date": ["", "", "", "", "ASAP"]
       }

# create pandas dataframe
df = pd.DataFrame(data)

enter image description here

# clean up column names 
df.columns = [col.strip() for col in df.columns]

# group data
df = df.groupby(df.columns, axis=1).agg(lambda x: x.apply(lambda y: ''.join([str(l) for l in y if str(l) != "nan"]), axis=1))

# reorder column
df = df[['Sr no', 'Due Date']]

df

Returns

enter image description here

  • Related