I have to combine the rows based on the last word in the row, Like
Answer:
I have written the below code & it's working fine as expected, however, it becomes very slow when I have huge data (10K rows).
#split the string & take the last word
df["last_Word"] = df["Donor"].str.split().str[-1].str.lower()
df["Match_end"] = df["last_Word"].isin(align["KeyWords_end"].str.lower())
Add two new columns in a data frame
df["Cleaned"]= ""
df["Mark"]= ""
Align the text based on the last word & mark delete rows as "delete"
for i in range(len(df)):
if ((df["Match_end"].iloc[i]== True) and (df["Match_end"].iloc[i 1]== True)):
df["Mark"].iloc[i 1]= "delete"
df["Mark"].iloc[i 2]= "delete"
df["Cleaned"].iloc[i]= df["Donor"].iloc[i] " " df["Donor"].iloc[i 1] " " df["Donor"].iloc[i 2]
Delete the mark rows
df = df[~df['Mark'].str.contains("delete")]
Update the newly created column
for i in range(len(df)):
if len(df["Cleaned"].iloc[i])== 0:
df["Cleaned"].iloc[i]= df["Donor"].iloc[i]
#Drop the unwanted columns
df.drop(["Donor","Mark","last_Word","Match_end"], axis = 1, inplace = True)
#Rename the newly created column
df.rename(columns= {"Cleaned": "Donor"},inplace = True)
CodePudding user response:
Assuming you want to combine the strings ending in "and" or "&", use a regex to identify those strings, then groupby.agg
:
m = ~df['donor'].str.contains(r'(?:\band|&)\s*$').shift(fill_value=False)
df.groupby(m.cumsum(), as_index=False).agg({'donor': ' '.join})
Example output:
donor
0 ABC, DEF & GHI
1 JKL MNO and PQR and STU
Used input:
donor
0 ABC, DEF &
1 GHI
2 JKL MNO and
3 PQR and
4 STU