I would like to merge multiple rows of a dataframe into a single Cell, I already tried multiple ways to merge the required input into output format but couldn't able to succeed. below is my input format and desired out put format any help is much Appreciated!
Input Data
Desired Output Data
import pandas as pd
data = pd.DataFrame({"Text1": ["1_1" , "1_1" , "1_1" , "1_2" , "1_2" , "1-3" , "1-3" , "1-3" , "", "1-4" , "1-4" ],
"Text2": ["HI" , "HI" , "HI" , "Hello" , "Hello" , "Hiiii" , "" , "" , "How are You?" , "Hellooo" , ""],
"Text3": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", "" , "good day.", "good morning."],
"Text4": ["S11" , "S11" , "S11" , "S12" , "S12" , "S12" , "S13" , "S13" , "" , "S14" , "S14" ]})
df = pd.DataFrame(data, columns = ['Text1', 'Text2','Text4', 'Text4'])
print(df)
CodePudding user response:
You need to forward fill the Text1
column first to replace the blanks, then group by that column and aggregate the other column.
import pandas as pd
import numpy as np
df = pd.DataFrame({"Text1": ["HI", "HI", "HI", "Hello", "Hello", "Hiiii", "", "", "How are You?", "Hellooo", ""],
"Text2": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", "", "good day.", "good morning."]})
result = df.groupby(df["Text1"].replace("", np.nan).ffill()).agg({"Text2": "/".join})
CodePudding user response:
This should work
df['Text2'] = df[['Text1','Text2']].groupby(['Text1'])['Text2'].transform(lambda x: ','.join(x))
df[['Text1','Text2']].drop_duplicates()