Home > Enterprise >  Replacing number values in a pandas data frame column with strings values from another data frame co
Replacing number values in a pandas data frame column with strings values from another data frame co

Time:12-22

I have two data frames, A and B.

df'A':

Col 1
57253,00987(4567)
asdf(78985>00987)

column A is string type.

df'B':

Col 1 Col 2
57253 TRUE
78985 NEGATIVE
00987 LAUGHS

Some of the numbers in Col 1 of df'A' is present in Col 1 of df'B'

I want to replace the number values in df'A' with the string values in Col 2 from df'B' that share the index with the same number in Col 1 of df'B'.

So the expected result(updated df'A') would be:

Col 1
TRUE,LAUGHS(4567)
asdf(NEGATIVE>LAUGHS)

Using "for i in dfA.index:" I have tried:

Initially, the .replace(), .update() methods but don't think I can achieve the result with those.

I have tried using regex (the numbers that need replacing are always 5 digits) operations for over half a day but I'm not getting desired result (I new to this). But I'm glad I'm learning regex.

CodePudding user response:

I hope, this works for your solution. I create an custom function for every value of df_A and check it's value with the second dataframe values.

import pandas as pd
df_A = pd.DataFrame([
    {'Col 1': '57253,00987(4567)'},
    {'Col 1': 'asdf(78985>00987)'}
])
df_B = pd.DataFrame([
    {'Col 1': '57253', 'Col 2': 'TRUE'},
    {'Col 1': '78985', 'Col 2': 'NEGATIVE'},
    {'Col 1': '00987', 'Col 2': 'LAUGHS'},
])
def updateVal(v):
    for col_1, col_2 in df_B.values:
        if col_1 in v:
            v = v.replace(col_1, col_2)
    return v
    
df_A['Col 1'] = df_A['Col 1'].apply(updateVal)
df_A

CodePudding user response:

One way to achieve this would be to use a combination of regular expressions and the .apply() method to modify the values in the Col 1 column of dfA.

import re
import pandas as pd


dfA = pd.DataFrame({'Col 1': ['57253,00987(4567)', 'asdf(78985>00987)']})
dfB = pd.DataFrame({'Col 1': [57253, 78985, 987], 'Col 2': ['TRUE', 'NEGATIVE', 'LAUGHS']})

def replace_numbers(s):
  numbers = re.findall(r'\b\d{5}\b', s)
  # Replace each number with its corresponding value from dfB
  for number in numbers:
    s = s.replace(number, dfB[dfB['Col 1'] == int(number)]['Col 2'].values[0])
  return s

# Apply the replace_numbers function to each value in the 'Col 1' column of dfA
dfA['Col 1'] = dfA['Col 1'].apply(replace_numbers)

print(dfA)

Alternative Solution: With no need for regex.

# Create sample data frames
dfA = pd.DataFrame({'Col 1': ['57253,00987(4567)', 'asdf(78985>00987)']})
dfB = pd.DataFrame({'Col 1': [57253, 78985, 987], 'Col 2': ['TRUE', 'NEGATIVE', 'LAUGHS']})

# Create a dictionary mapping numbers to their corresponding values in dfB
number_map = dfB.set_index('Col 1')['Col 2'].to_dict()
dfA['Col 1'] = dfA['Col 1'].replace(number_map)

# Print the modified dfA
print(dfA)

Output:

                   Col 1
0      TRUE,LAUGHS(4567)
1  asdf(NEGATIVE>LAUGHS)
  • Related