join rows with same index and keep other rows unchanged-CodePudding

I have this data frame

df=

ID    join        Chapter  ParaIndex      text 
 0     NaN         1         0            I am test 
 1     NaN         2         1            it is easy 
 2     1           3         2            but not so
 3     1           3         3            much easy

I want to get this
(merge the column "text" with the same index in column "join" and reindex "ID" and "ParaIndex", rest without change)

dfEdited=

ID    join        Chapter  ParaIndex      text 
 0     NaN         1         0            I am test 
 1     NaN         2         1            it is easy 
 2     1           3         2            but not so much easy

I used this command

dfedited=df.groupby(['join'])['text'].apply(lambda x: ' '.join(x.astype(str))).reset_index()

it only merges the row with the numerical index in column join and exclude row with non index

so I changed to this

dfedited=df.groupby(['join'],dropna=False)['text'].apply(lambda x: ' '.join(x.astype(str))).reset_index()

here it merges all rows based on index join but it considers row with index NaN as one group therefore join them also to be group! however, I do not want to join them ...any idea? many thanks

I also used this

dfedited=df.groupby(['join', "ParaIndex", "Chapter"],dropna=False  )['text'].apply(lambda x: ' '.join(x.astype(str) )).reset_index()

it looks better as it has all columns, but no changes!!

CodePudding user response：

I hope you can give an example of data and code. And do it step by step rather than just code it in one line without testing. It's hard to help you with this one-line code.

But the main idea is to use merge(..., on='join')

CodePudding user response：

I solved that so;

dfEdited = df.assign(key=df['join'].ne(df['join'].shift()).cumsum()).groupby('key').agg({ "ParaIndex": 'first', "Chapter":'first','text':' '.join}).reset_index()