I'm trying to replace column values in dataframe_1 with values from another column in the dataframe_2. For example:
df1 = pd.DataFrame({'col1': ['animal', 'animal', 'animal'], 'col2': [0, 1, 2]}).set_index('col2')
df2 = pd.DataFrame({'col1': ['cat', 'dog', 'cat'], 'col2': [0, 2, 1]}).set_index('col2')
df1:
col1
col2
0 animal
1 animal
2 animal
df2:
col1
col2
0 cat
2 dog
1 cat
In this case after this command:
df1['col1'] = df2['col1']
I got this:
col1
col2
0 cat
1 cat
2 dog
It's work fine. But what should I do, if I want this changes for the certain string character in my column?
df1 = pd.DataFrame({'col1': ['animal id_0', 'animal id_1', 'animal id_2'], 'col2': [0, 1, 2]}).set_index('col2')
df2 = pd.DataFrame({'col1': ['cat', 'dog', 'cat'], 'col2': [0, 2, 1]}).set_index('col2')
I believe I can get something like this:
col1
col2
0 animal cat
1 animal cat
2 animal dog
With this line (and with attempt to do something like mapping, lol):
df1['col1'] = df1['col1'].str.replace(r'\d', df2['col1'], regex=True)
But it seems that it isn't as easy as it seems at first glance. Maybe you have any ideas?
CodePudding user response:
You can merge the two dataframes by index and then using .apply
to replace _
with correct value:
x = df1.merge(df2, left_index=True, right_index=True)
x["col1_x"] = x.apply(lambda x: x["col1_x"].replace("_", x["col1_y"]), axis=1)
x = x.rename(columns={"col1_x": "col1"})
print(x[["col1"]])
Prints:
col1
col2
0 animal cat
1 animal cat
2 animal dog
CodePudding user response:
It looks like there are two components to your problem:
you have integer keys in
df1
that map to specific animal strings indf2
(e.g.0 = 'cat', 1 = 'cat', 2 = 'dog'
, etc)those integer keys are embedded in a string and need to be extracted somehow (you need to pull the
1
out of'animal 1'
)
There is a one liner solution, though the below assumes you only ever have a single digit key to capture from each string in df['col1']
:
>>>df1['col1'].str.replace('(\d)',lambda x: df2['col1'].to_dict()[int(x.group(0))])
col2
0 animal cat
1 animal dog
2 animal cat
Name: col1, dtype: object
There's three things going on here:
df['col1'].str.replace('(\d)',...)
is regex capturing those digits and passing them as a usable argument to the replacement side of the function (see pd.Series.str.replace for more info)
df2['col1'].to_dict()
is turning the implicit mapping defined in df2
into an actual map, in the form of a dict
with integer keys taken from the index of df2
. You could also write something similar with the .loc
accessor.
>>> df2['col1'].to_dict()
{0: 'cat', 2: 'dog', 1: 'cat'}
Lastly, lambda x: df2['col1'].to_dict()[int(x.group(0))]
turns that map into a function that takes your capture group x
, pulls the first (and only, in this case) captured group from it, casts it from str
to int
and then passes it through our map. Pandas' str.replace function allows callables like this as a valid repl
when used with a regex capture pattern.