Home > OS >  Replacing Pandas series substrings with elements of the same index from other lists
Replacing Pandas series substrings with elements of the same index from other lists

Time:08-05

Context

I have a pandas dataframe that contains 3 columns.

df = pd.DataFrame({
    'c1': ['blah testing1 blah', 'hello1 world'], 
    'c2': ['testing1', 'hello1'], 
    'c3': ['testingggg', 'heeello']
})

c1 contains arbitrary strings, c2 contains words found in c1, and c3 contains words that will replace the corresponding words in c1 but they differ in a random length.

Ideal output

For each row I want to find the word that is in c2 within the c1 text and then replace the first instance of that substring that is an exact match with the word in c3. I want this output to be in a new c4.

e.g.

c1: 'blah testing1 blah', 
c2: 'testing1', 
c3: 'testingggg', 
c4: 'blah testingggg blah'

What I currently have

# initialize the new column to contain the original text
df['c4'] = df['c1']

for i in range(len(df['c1'])):
    original_word = df['c2'].loc[i]
    replacement_word = df['c3'].loc[i]
    df['replaced_utterance'][i] = df['replaced_utterance'][i].replace(original_word, replacement_word)

But this seems to not actually update anything?

CodePudding user response:

try: to replace all occurrences

df['c4'] = df.apply(lambda x: x[0].replace(x[1], x[2]), axis=1)

to replace only 1 (or x) occurrences

df['c4'] = df.apply(lambda x: x[0].replace(x[1], x[2], 1), axis=1)

Result:

    c1                  c2          c3          c4
0   blah testing1 blah  testing1    testingggg  blah testingggg blah
1   hello1 world        hello1      heeello     heeello world
  • Related