Hi I am trying to replace values in a df1 column A with values from df2 column B, by matching them with df2 column A. Basically if the string of row x in df1['a'] is equal to a string of row y in df2['a'] I want to replace the value of df1['a'] with df2['b']. I have tried a couple things but for some reason this isn't working properly. I also wants to replace values that aren't in df2['a'] with None.
my sample data is:
df1 = pd.DataFrame({'a': ['a','b','a','d','e','f','g', 'h', 'i'],
'b': ['alpha', 'alpha', 'alpha', 'beta', 'beta', 'charlie', 'charlie', "alpha", "beta"],
'c': ['elephant', "zebra",'elephant', "zebra",'elephant', "zebra",'elephant','elephant', "zebra"]})
df2 = pd.DataFrame({'a': ['a','b','c','d','e','f','g'],
'b': ['alpha', 'alpha', 'alpha', 'beta', 'beta', 'charlie', 'charlie'],
'c': ['elephant', "zebra",'elephant', "zebra",'elephant', "zebra",'elephant']})
df1['UID'] = df1['a'] df1['b'] df1['c']
df2['UID'] = df2['a'] df2['b'] df2['c']
df1['a'].loc[df1['UID'].isin(df2['UID'])] = df2['c']
animals = ['elephant','zebra']
df1.loc[~df1['a'].isin(animals), "a"] = "None"
This works in my sample data but isn't working in my actual data set which is much larger. Any ideas on how to do something similar to this?
CodePudding user response:
I think the explanation is not quite correct. Based on your code attempt, I suspect that what you mean is:
For each row
i
ofdf1
that matches (for all fields(a, b, c)
) a rowj
ofdf2
, then replacedf1.loc[i, 'a']
bydf2.loc[j, 'c']
.
If that is the correct interpretation of your question, then:
First, it is safer to use a tuple
of the row values as UID
for the row, instead of the string concatenation: imagine a row '_', 'foo', 'bar'
and another '_', 'fooba', 'r'
-- they are most certainly distinct. The second advantage of tuple
is that it works with other types, not just strings. Thus:
df1['UID'] = df1[['a', 'b', 'c']].apply(tuple, axis=1)
df2['UID'] = df2[['a', 'b', 'c']].apply(tuple, axis=1)
Then, the expected result can be obtained by merging on UID
:
df = df1.assign(
a=df1.merge(
df2[['UID', 'c']], on='UID', how='left',
suffixes=['', '_y'])['c_y'].fillna('None')
)
>>> df
a b c UID
0 elephant alpha elephant (a, alpha, elephant)
1 zebra alpha zebra (b, alpha, zebra)
2 elephant alpha elephant (a, alpha, elephant)
3 zebra beta zebra (d, beta, zebra)
4 elephant beta elephant (e, beta, elephant)
5 zebra charlie zebra (f, charlie, zebra)
6 elephant charlie elephant (g, charlie, elephant)
7 None alpha elephant (h, alpha, elephant)
8 None beta zebra (i, beta, zebra)