Merge specific rows in pandas Df-CodePudding

I have df after read_excel where some of values (from one column, with strings) are divided. How can i merge them back?

for example: the df i have

{'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}

the df i want

{'CODE': ['A','B','C'],
'TEXT': ['Aa','Bbb','C'],
'NUMBER': ['1','2','3']}

I can't find the right solution. I tried to import data in different ways but it also did not help

CodePudding user response：

You can forward fill missing values or Nones for groups with aggregate join and first non None value for NUMBER column:

d = {'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}

df = pd.DataFrame(d)


df1 = df.groupby(df['CODE'].ffill()).agg({'TEXT':''.join, 'NUMBER':'first'}).reset_index()
    
print (df1)
  CODE TEXT NUMBER
0    A   Aa      1
1    B  Bbb      2
2    C    C      3

You can generate dictionary:

cols = df.columns.difference(['CODE'])
d1 = dict.fromkeys(cols, 'first')
d1['TEXT'] = ''.join

df1 = df.groupby(df['CODE'].ffill()).agg(d1).reset_index()