(I recently asked this question on r/learnpython (here), but didn't get any feedback, so am re-posting it verbatim here. Hope that is okay!)
Suppose I have a DataFrame Y that looks like this:
index | x1 | x2 | A |
---|---|---|---|
0 | 1 | 2 | 5 |
1 | 3 | 7 | 1 |
And then I have a parent DataFrame X like this:
index | x1 | x2 | A |
---|---|---|---|
0 | 1 | 2 | 0 |
1 | 1 | 3 | 0 |
2 | 3 | 4 | 0 |
3 | 3 | 7 | 0 |
Further, suppose that any ['x1','x2']
combination that exists is unique (so any combination exists in either X or Y only 0 or 1 times, and if it exists in Y, then that particular ['x1','x2']
combination (although the 'A'
value may be different) exists in X as well).
For all ['x1','x2']
combinations in Y, I would like to find the corresponding indices in X. So here, those indices would be a list [0,3]
that I want.
My goal is, for all such rows in X with such an index (I'll call it j
here), to set
X['A'].loc[j] = Y['A'].iloc[j]
Currently I have this:
for i in range(len(X)):
v1 = X['x1'].iloc[i]
v2 = X['x2'].iloc[i]
extract = Y.query("x1 == %d" % v1).query("x2 == %d" % v2)
if len(extract_ori) != 0:
X['count'].loc[i] = extract['count'].iloc[0]
This does what I want, except it is pretty slow, and it seems like there should be a faster way to do this. Wondering what this might be, if it exists!
So the resulting DataFrame X should look like
index | x1 | x2 | A |
---|---|---|---|
0 | 1 | 2 | 5 |
1 | 1 | 3 | 0 |
2 | 3 | 4 | 0 |
3 | 3 | 7 | 1 |
CodePudding user response:
One option is to use MultiIndex.map
:
cols = ['x1','x2']
X['A'] = X.set_index(cols).index.map(Y.set_index(cols)['A']).fillna(0).astype(int)
Another option is left-merge on two columns:
cols = ['x1','x2']
X = X[cols].merge(Y[cols ['A']], on=cols, how='left').fillna(0)
Output:
index x1 x2 A
0 0 1 2 5
1 1 1 3 0
2 2 3 4 0
3 3 3 7 1
CodePudding user response:
You could set_index
the x1 and x2 column and update the dataframe on the A
column.
X1 = X.set_index(['x1', 'x2'])
Y1 = Y.set_index(['x1', 'x2'])
X1['A'].update(Y1['A']) # works inplace
X1.reset_index(inplace=True)
print(X1)
x1 x2 index A
0 1 2 0 5
1 1 3 1 0
2 3 4 2 0
3 3 7 3 1