I'm trying to keep only the text after "Background", but I didn't have success trying to do it. For instance, I have a comment like this:
05/2022: AB: 6/20/22 - I'm learning how to use pandas library.
Background: I'm trying to learn python.
How can I make all cells have only the background comment? It should look like this:
Background: I'm trying to learn python.
Please see my code below:
import pandas as pd
df = pd.read_excel(r"C:\Users\R\Desktop\PythonLib\data\52022.xlsx")
comments = df["Comment"]
df['new_background'] = df["Comment"].str.split('Background:').str[0]
print(df["new_background"])
CodePudding user response:
You should provide a sample of your data.
That said, you should probably do:
df['new_background'] = df["Comment"].str.replace(r'.*(?=Background:)',
'', regex=True)
Or, if you want NaN in case of missing background:
df['new_background'] = df["Comment"].str.extract(r'(Background:.*)')