I'm trying to replace a lots of strings (only three strings example but I have thousands strings actually) to other strings defined on "replaceWord".

"replaceWord" has no regularity.

However,code i wrote dose not work as I expected.

After running script, output is as below:

     before     after
0  test1234  test1234
1  test1234  test1234
2  test1234      1349
3  test1234  test1234
4  test1234  test1234

I need output as below;

  before    after
1 test1234  1349
2 test9012  te1210st
3 test5678  8579
4 april     I was born August
5 mcdonalds i like checkin

script

import os.path, time, re
import pandas as pd
import csv


body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)

for word in replaceWord:
    
    body01_after = re.sub(word[0], word[1], body01_before)
    body02_after = re.sub(word[0], word[1], body02_before)
    body03_after = re.sub(word[0], word[1], body03_before)
    body04_after = re.sub(word[0], word[1], body04_before)
    body05_after = re.sub(word[0], word[1], body05_before)

    df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
    
#df.head()
print(df)

df.to_csv('test_replace.csv')

CodePudding user response：

Use regular expressions to capture the non-digits (\D ) as the first group and the digits (\d ) as the second group. replace the text by starting with the second group \2 then first group \1

df['after'] = df['before'].str.replace(r'(\D )(\d )', r'\2\1', regex = True)

df
     before     after
1  test1234  1234test
2  test9012  9012test
3  test5678  5678test

Edit

Seems that you do not have the dataset. You have variables:

body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"

replaceWord = [
                ["test9012","te1210st"],
                ["test5678","8579"],
                ["test1234","1349"],
                ["april","August"],
                ["mcdonalds","chicken"],

]

# Gather the variables in a list
vars = re.findall('body0\\d[^,] ', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))

# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp

# Do the replacement
df['after'] = df['before'].str.replace('(\\w )',repl, regex= True)

df
        before_1            before              after
0  body01_before          test1234               1349
1  body02_before          test9012           te1210st
2  body03_before          test5678               8579
3  body04_before  i like mcdonalds     i like chicken
4  body05_before  I was born april  I was born August

CodePudding user response：

Does this suit your purpose?

words = ["test9012", "test5678", "test1234"]
updated = []

for word in words:
    for i, char in enumerate(word):
        if 47 < ord(char) < 58: # the character codes for digits 1-9
            updated.append(f"{word[i:]}{word[:i]}")
            break

print(updated)

The code prints: ['9012test', '5678test', '1234test']

CodePudding user response：

As I understand, you have a list of strings and a mapping dictionary in the form of: {oldString1: newString1, oldString2: newString2, ...} that you want to use to replace the original list of strings. The fastest (and maybe most Pythonic) approach I can think of is to simply save your mapping dictionary as a Python dict. For example:

mapping = {
   "test9012":"9012test",
   "test5678","5678test",
   "test1234","1234test",
}

If your list of strings is stored as a Python list, you can get the replaced list with the following code:

new_list = [mapping.get(key=old_string, default=old_string) for old_string in old_list]

Note: We use mapping.get() with default=old_string so that the function return the old_string in case it is not in the mapping dictionary.

If your list of strings is stored in a Pandas Series (or a column of a Pandas DataFrame), you can quickly replace the strings with:

new_list = old_list.map(mapping, na_action='ignore')

Note: We set na_action='ignore' so that the function return the old_string in case it is not in the mapping dictionary.

CodePudding user response：

You can use regex to match the pattern.

import os.path, time, re
import pandas as pd
import csv

words = ["test9012", "test5678", "test1234"]

for word in words:
  textOnlyMatch = re.match("(([a-z]|[A-Z])*)", word)
  textOnly = textOnlyMatch.group(0) // take the entire match group
  numberPart = word.split(textOnly)[1] // take string of number only
  result = numberPart   textOnly
  df = df.append({'before':word,'after':result}, ignore_index=True)

#df.head()
print(df)

df.to_csv('test_replace.csv')

So by using regex match you can separate the alphabet only and the number only part.