Home > Software engineering >  How to replace a character once in a text in a pandas dataframe
How to replace a character once in a text in a pandas dataframe

Time:01-17

I have a pandas dataframe "X_test" that contains an index and one text column "review". I want to check the text and replace character 'b' with '6' just one time: if the text contains the 'b' character several times I want the replacement to be done only once randomly.

The code below replaces b in all places in line 2 for example:

X_test["modified_review"] = X_test["review"]
X_test.loc[2, "modified_review"]= X_test.loc[2, "modified_review"].replace('b','6')

CodePudding user response:

You need to use a custom function.

Let's split the string, and join it back with one item being replaced randomly:

import random

def random_sub(s, pat, repl):
    l = s.split(pat)
    new = [pat]*(len(l)-2) [repl]
    random.shuffle(new)
    return ''.join([e for x in zip(l, new [None]) for e in x][:-1])

Xtest.loc[2, 'modified_review'] = random_sub(Xtest.loc[2, 'modified_review'], pat='b', repl='6')

print(Xtest)

Output:

  modified_review
0            aaaa
1             cbc
2        aba6abab

CodePudding user response:

If speed isn't relevant, you can write a function that does the replacement for a single string and apply it to the column.

This function randomly selects an index of a matching character and replaces that position in the output (i.e., it's only going to work for single characters – for longer patterns see mozway's answer or use regular expressions):

import random

def random_replace(string, char, repl):
    if char not in string:
        return string

    string = list(enumerate(string))
    to_replace = random.choice([i for i,c in string if c==char])

    return ''.join(repl if i == to_replace else c for i,c in string)

With that in place, you can do Xtest.loc[2, 'modified_review'] = Xtest.loc[2, 'modified_review'].apply(random_replace, char='b', repl='6').

  • Related