Home > Enterprise >  Pandas how replace values of every sentences starting by specific char?
Pandas how replace values of every sentences starting by specific char?

Time:10-12

I have a daframe like this:

    Sentences
0   "a) Sentence 1"
1   "b) Sentence 2"

I would like to ignore "a) " and "b) " at the beginning of every row of the column Sentences.

I tried to code it: When the three first char of a sentence is 'b) ' I take the [3:] of the sentence:

df.loc[df.Names[0:3] == 'b) ', "Names"] = row['Names'][3:]

But doesn't work

Expected output:

    Sentences
0   "Sentence 1"
1   "Sentence 2"

CodePudding user response:

Using below as sample:

    Sentences
0   a) Sentence 1
1   b) Sentence 2
2   This is a test sentence
3   NaN

You can use pd.Series.str.startswith to check for rows starting with a) and b), and then assign directly:

df.loc[df['Sentences'].str.startswith(("a) ","b) "), na=False), "Sentences"] = df['Sentences'].str[3:]

print (df)

                 Sentences
0               Sentence 1
1               Sentence 2
2  This is a test sentence
3                      NaN

CodePudding user response:

Try the str.replace function of the column like this

df.sentence.str.replace(".) ", "")

this will return a dataframe that you want I think.

Reference str.replace(): https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html?highlight=str replace#pandas.Series.str.replace

str.replace takes a regular expression or string and replaces it with another string. For more info please see the link above.

CodePudding user response:

If every sentence is going to start with a letter followed by a ')' and assuming you don't want to replace additional occurrences of ) after the first one, this will work

df["Sentences"] = df["Sentences"].str.split("\) ", expand=True).drop(0, axis=1).to_numpy().sum(axis=1) 
  • Related