I have a daframe like this:
Sentences
0 "a) Sentence 1"
1 "b) Sentence 2"
I would like to ignore "a) " and "b) " at the beginning of every row of the column Sentences.
I tried to code it: When the three first char of a sentence is 'b) ' I take the [3:] of the sentence:
df.loc[df.Names[0:3] == 'b) ', "Names"] = row['Names'][3:]
But doesn't work
Expected output:
Sentences
0 "Sentence 1"
1 "Sentence 2"
CodePudding user response:
Using below as sample:
Sentences
0 a) Sentence 1
1 b) Sentence 2
2 This is a test sentence
3 NaN
You can use pd.Series.str.startswith
to check for rows starting with a) and b), and then assign directly:
df.loc[df['Sentences'].str.startswith(("a) ","b) "), na=False), "Sentences"] = df['Sentences'].str[3:]
print (df)
Sentences
0 Sentence 1
1 Sentence 2
2 This is a test sentence
3 NaN
CodePudding user response:
Try the str.replace function of the column like this
df.sentence.str.replace(".) ", "")
this will return a dataframe that you want I think.
Reference str.replace(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html?highlight=str replace#pandas.Series.str.replace
str.replace takes a regular expression or string and replaces it with another string. For more info please see the link above.
CodePudding user response:
If every sentence is going to start with a letter followed by a ')' and assuming you don't want to replace additional occurrences of )
after the first one, this will work
df["Sentences"] = df["Sentences"].str.split("\) ", expand=True).drop(0, axis=1).to_numpy().sum(axis=1)