I have two data frames, A and B.
df'A':
Col 1 |
---|
57253,00987(4567) |
asdf(78985>00987) |
column A is string type.
df'B':
Col 1 | Col 2 |
---|---|
57253 | TRUE |
78985 | NEGATIVE |
00987 | LAUGHS |
Some of the numbers in Col 1 of df'A' is present in Col 1 of df'B'
I want to replace the number values in df'A' with the string values in Col 2 from df'B' that share the index with the same number in Col 1 of df'B'.
So the expected result(updated df'A') would be:
Col 1 |
---|
TRUE,LAUGHS(4567) |
asdf(NEGATIVE>LAUGHS) |
Using "for i in dfA.index:" I have tried:
Initially, the .replace(), .update() methods but don't think I can achieve the result with those.
I have tried using regex (the numbers that need replacing are always 5 digits) operations for over half a day but I'm not getting desired result (I new to this). But I'm glad I'm learning regex.
CodePudding user response:
I hope, this works for your solution. I create an custom function for every value of df_A and check it's value with the second dataframe values.
import pandas as pd
df_A = pd.DataFrame([
{'Col 1': '57253,00987(4567)'},
{'Col 1': 'asdf(78985>00987)'}
])
df_B = pd.DataFrame([
{'Col 1': '57253', 'Col 2': 'TRUE'},
{'Col 1': '78985', 'Col 2': 'NEGATIVE'},
{'Col 1': '00987', 'Col 2': 'LAUGHS'},
])
def updateVal(v):
for col_1, col_2 in df_B.values:
if col_1 in v:
v = v.replace(col_1, col_2)
return v
df_A['Col 1'] = df_A['Col 1'].apply(updateVal)
df_A
CodePudding user response:
One way to achieve this would be to use a combination of regular expressions and the .apply()
method to modify the values in the Col 1
column of dfA
.
import re
import pandas as pd
dfA = pd.DataFrame({'Col 1': ['57253,00987(4567)', 'asdf(78985>00987)']})
dfB = pd.DataFrame({'Col 1': [57253, 78985, 987], 'Col 2': ['TRUE', 'NEGATIVE', 'LAUGHS']})
def replace_numbers(s):
numbers = re.findall(r'\b\d{5}\b', s)
# Replace each number with its corresponding value from dfB
for number in numbers:
s = s.replace(number, dfB[dfB['Col 1'] == int(number)]['Col 2'].values[0])
return s
# Apply the replace_numbers function to each value in the 'Col 1' column of dfA
dfA['Col 1'] = dfA['Col 1'].apply(replace_numbers)
print(dfA)
Alternative Solution: With no need for regex.
# Create sample data frames
dfA = pd.DataFrame({'Col 1': ['57253,00987(4567)', 'asdf(78985>00987)']})
dfB = pd.DataFrame({'Col 1': [57253, 78985, 987], 'Col 2': ['TRUE', 'NEGATIVE', 'LAUGHS']})
# Create a dictionary mapping numbers to their corresponding values in dfB
number_map = dfB.set_index('Col 1')['Col 2'].to_dict()
dfA['Col 1'] = dfA['Col 1'].replace(number_map)
# Print the modified dfA
print(dfA)
Output:
Col 1
0 TRUE,LAUGHS(4567)
1 asdf(NEGATIVE>LAUGHS)