I have a pandas dataframe "X_test" that contains an index and one text column "review". I want to check the text and replace character 'b' with '6' just one time: if the text contains the 'b' character several times I want the replacement to be done only once randomly.
The code below replaces b in all places in line 2 for example:
X_test["modified_review"] = X_test["review"]
X_test.loc[2, "modified_review"]= X_test.loc[2, "modified_review"].replace('b','6')
CodePudding user response:
You need to use a custom function.
Let's split the string, and join it back with one item being replaced randomly:
import random
def random_sub(s, pat, repl):
l = s.split(pat)
new = [pat]*(len(l)-2) [repl]
random.shuffle(new)
return ''.join([e for x in zip(l, new [None]) for e in x][:-1])
Xtest.loc[2, 'modified_review'] = random_sub(Xtest.loc[2, 'modified_review'], pat='b', repl='6')
print(Xtest)
Output:
modified_review
0 aaaa
1 cbc
2 aba6abab
CodePudding user response:
If speed isn't relevant, you can write a function that does the replacement for a single string and apply
it to the column.
This function randomly selects an index of a matching character and replaces that position in the output (i.e., it's only going to work for single characters – for longer patterns see mozway's answer or use regular expressions):
import random
def random_replace(string, char, repl):
if char not in string:
return string
string = list(enumerate(string))
to_replace = random.choice([i for i,c in string if c==char])
return ''.join(repl if i == to_replace else c for i,c in string)
With that in place, you can do Xtest.loc[2, 'modified_review'] = Xtest.loc[2, 'modified_review'].apply(random_replace, char='b', repl='6')
.