Pandas: Multiple rows of a data frame into a single cell-CodePudding

I would like to merge multiple rows of a dataframe into a single Cell, I already tried multiple ways to merge the required input into output format but couldn't able to succeed. below is my input format and desired out put format any help is much Appreciated!

Input Data

Desired Output Data

import pandas as pd

data = pd.DataFrame({"Text1": ["1_1"    , "1_1"  , "1_1"      , "1_2"      , "1_2"          , "1-3"    , "1-3"  , "1-3"      , "", "1-4"      , "1-4"          ],
                     "Text2": ["HI"     , "HI"   , "HI"       , "Hello"    , "Hello"        , "Hiiii"  , ""     , ""         , "How are You?" , "Hellooo"   , ""],
                     "Text3": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", ""             , "good day.", "good morning."],
                     "Text4": ["S11"    , "S11"  , "S11"      , "S12"      , "S12"          , "S12"    , "S13"  , "S13"      , ""             , "S14"      , "S14"          ]})

df = pd.DataFrame(data, columns = ['Text1', 'Text2','Text4', 'Text4']) 

print(df)

CodePudding user response：

You need to forward fill the Text1 column first to replace the blanks, then group by that column and aggregate the other column.

import pandas as pd
import numpy as np
df = pd.DataFrame({"Text1": ["HI", "HI", "HI", "Hello", "Hello", "Hiiii", "", "", "How are You?", "Hellooo", ""],
                   "Text2": ["welcome", "Howdy", "greetings", "good day.", "good morning.", "welcome", "Howdy", "greetings", "", "good day.", "good morning."]})
result = df.groupby(df["Text1"].replace("", np.nan).ffill()).agg({"Text2": "/".join})

CodePudding user response：

This should work

df['Text2'] = df[['Text1','Text2']].groupby(['Text1'])['Text2'].transform(lambda x: ','.join(x))
df[['Text1','Text2']].drop_duplicates()