Context
I have a pandas dataframe that contains 3 columns.
df = pd.DataFrame({
'c1': ['blah testing1 blah', 'hello1 world'],
'c2': ['testing1', 'hello1'],
'c3': ['testingggg', 'heeello']
})
c1
contains arbitrary strings, c2
contains words found in c1
, and c3
contains words that will replace the corresponding words in c1
but they differ in a random length.
Ideal output
For each row I want to find the word that is in c2
within the c1
text and then replace the first instance of that substring that is an exact match with the word in c3
. I want this output to be in a new c4
.
e.g.
c1: 'blah testing1 blah',
c2: 'testing1',
c3: 'testingggg',
c4: 'blah testingggg blah'
What I currently have
# initialize the new column to contain the original text
df['c4'] = df['c1']
for i in range(len(df['c1'])):
original_word = df['c2'].loc[i]
replacement_word = df['c3'].loc[i]
df['replaced_utterance'][i] = df['replaced_utterance'][i].replace(original_word, replacement_word)
But this seems to not actually update anything?
CodePudding user response:
try: to replace all occurrences
df['c4'] = df.apply(lambda x: x[0].replace(x[1], x[2]), axis=1)
to replace only 1 (or x) occurrences
df['c4'] = df.apply(lambda x: x[0].replace(x[1], x[2], 1), axis=1)
Result:
c1 c2 c3 c4
0 blah testing1 blah testing1 testingggg blah testingggg blah
1 hello1 world hello1 heeello heeello world