I have a dataframe which has the words Due Date written differently but it all means the same. The problem is in my master data(xls file), one due date has an extra space or doesnt and i cant change that.All i can change is my final output.
Sr no Due Date Due Date DueDate
1 1/2/22
2 1/5/22
3
4
5 ASAP
I just want that column 2 and 3 all combine under column one at the same location they were
Sr No. Due Date
1 1/2/22
2 1/5/22
3
4
5 ASAP
CodePudding user response:
You can use filter
with a regex to get similar names, then bfill
and get the first. Finally join to original devoid of the found columns:
d = df.filter(regex=r'(?i)due\s*date')
df2 = (df
.drop(columns=list(d.columns))
.join(d.bfill(1).iloc[:,0])
)
Output:
Sr no Due Date
0 1 1/2/22
1 2 1/5/22
2 3 None
3 4 None
4 5 ASAP
CodePudding user response:
Try with bfill
out = df.bfill(axis = 1)[['Sr No','Due Date']]
CodePudding user response:
Possible solution is the following:
import pandas as pd
# set test data
data = {"Sr no": [1,2,3,4,5],
"Due Date": ["1/2/22", "", "", "", ""],
"Due Date ": ["", "1/2/22", "", "", ""],
" Due Date": ["", "", "", "", "ASAP"]
}
# create pandas dataframe
df = pd.DataFrame(data)
# clean up column names
df.columns = [col.strip() for col in df.columns]
# group data
df = df.groupby(df.columns, axis=1).agg(lambda x: x.apply(lambda y: ''.join([str(l) for l in y if str(l) != "nan"]), axis=1))
# reorder column
df = df[['Sr no', 'Due Date']]
df
Returns