I am trying to find a quicker way than using a for loop in a for loop to replace the variables in column a in one table with the variables in column b in another table.
for x in range(len(a["a"])):
for y in range(len(b["a"])):
if a["a"][x] == b["a"][y]:
a["a"] = out['a'].replace([a["a"][x]],b["b"][y]])
This currently works but is super slow, is there anyway to do the same thing but make it faster?
Sample Data:
a = pd.DataFrame({'a': ['a','b','c','d','e','f','g', 'h', 'i']})
b = pd.DataFrame({'a': ['a','b','c','d','e','f','g'], 'b': ['alpha', 'alpha', 'alpha', 'beta', 'beta', 'charlie' 'charlie']})
Basically I am trying to replace the value in a["a"] with the values in b["b"] if a["a"] == b["a"]
CodePudding user response:
You cannot use the pandas where
function because your two dataframes have different numbers of elements. But the code below will work (I renamed your dataframes df1 and df2 for clarity)
df1['a'].loc[df1['a'].isin(df2['a'])] = df2['b']
which for your sample data results in
a
0 alpha
1 alpha
2 alpha
3 beta
4 beta
5 charlie
6 charlie
7 h
8 i