Pandas DataFrame: replace column values in one df with column values from another df in case of only-CodePudding

I'm trying to replace column values in dataframe_1 with values from another column in the dataframe_2. For example:

df1 = pd.DataFrame({'col1': ['animal', 'animal', 'animal'], 'col2': [0, 1, 2]}).set_index('col2')
df2 = pd.DataFrame({'col1': ['cat', 'dog', 'cat'], 'col2': [0, 2, 1]}).set_index('col2')

df1:

        col1
col2        
0     animal
1     animal
2     animal

df2:

      col1
col2     
0     cat
2     dog
1     cat

In this case after this command:

df1['col1'] = df2['col1']

I got this:

     col1
col2     
0     cat
1     cat
2     dog

It's work fine. But what should I do, if I want this changes for the certain string character in my column?

df1 = pd.DataFrame({'col1': ['animal id_0', 'animal id_1', 'animal id_2'], 'col2': [0, 1, 2]}).set_index('col2')
df2 = pd.DataFrame({'col1': ['cat', 'dog', 'cat'], 'col2': [0, 2, 1]}).set_index('col2')

I believe I can get something like this:

            col1
col2          
0     animal cat
1     animal cat
2     animal dog

With this line (and with attempt to do something like mapping, lol):

df1['col1'] = df1['col1'].str.replace(r'\d', df2['col1'], regex=True)

But it seems that it isn't as easy as it seems at first glance. Maybe you have any ideas?

CodePudding user response：

You can merge the two dataframes by index and then using .apply to replace _ with correct value:

x = df1.merge(df2, left_index=True, right_index=True)
x["col1_x"] = x.apply(lambda x: x["col1_x"].replace("_", x["col1_y"]), axis=1)
x = x.rename(columns={"col1_x": "col1"})

print(x[["col1"]])

Prints:

            col1
col2            
0     animal cat
1     animal cat
2     animal dog

CodePudding user response：

It looks like there are two components to your problem:

you have integer keys in df1 that map to specific animal strings in df2 (e.g. 0 = 'cat', 1 = 'cat', 2 = 'dog', etc)
those integer keys are embedded in a string and need to be extracted somehow (you need to pull the 1 out of 'animal 1')

There is a one liner solution, though the below assumes you only ever have a single digit key to capture from each string in df['col1']:

>>>df1['col1'].str.replace('(\d)',lambda x: df2['col1'].to_dict()[int(x.group(0))])
col2
0    animal cat
1    animal dog
2    animal cat
Name: col1, dtype: object

There's three things going on here:

df['col1'].str.replace('(\d)',...) is regex capturing those digits and passing them as a usable argument to the replacement side of the function (see pd.Series.str.replace for more info)

df2['col1'].to_dict() is turning the implicit mapping defined in df2 into an actual map, in the form of a dict with integer keys taken from the index of df2. You could also write something similar with the .loc accessor.

>>> df2['col1'].to_dict()
{0: 'cat', 2: 'dog', 1: 'cat'}

Lastly, lambda x: df2['col1'].to_dict()[int(x.group(0))] turns that map into a function that takes your capture group x, pulls the first (and only, in this case) captured group from it, casts it from str to int and then passes it through our map. Pandas' str.replace function allows callables like this as a valid repl when used with a regex capture pattern.