I have two datasets;
DF1 contains a column that has a short word.
DF2 has a column with a series of words, where in some cases they start with the short word from DF1
I would like to create a new column in DF2, and everytime the DF1 word appears at the beginning of DF2's word, to then place that DF1 word into the newly created column at that location.
The idea being that once completed, I can then merge the two datasets using the matching word.
How would I do this?
DF1
ref |
---|
ABC |
DEF |
GHI |
DF2
word |
---|
ABC123 |
DEF456 |
GHI789 |
DF2 - Desired output
word | new column |
---|---|
ABC123 | ABC |
DEF456 | DEF |
GHI789 | GHI |
CodePudding user response:
You can create a list of pattern from ref
column then try to extract it from word
column:
pattern = fr"({'|'.join(df1['ref'])})"
df2['new column'] = df2['word'].str.extract(pattern)
print(df2)
# Output
word new column
0 ABC123 ABC
1 DEF456 DEF
2 GHI789 GHI
If it's too simple, you have to use fuzzy
logic. You can check this Question & Answer