Home > Mobile >  (Pandas, Python) Selecting indices of a parent DF based on shared column values with a child DF
(Pandas, Python) Selecting indices of a parent DF based on shared column values with a child DF

Time:05-07

(I recently asked this question on r/learnpython (here), but didn't get any feedback, so am re-posting it verbatim here. Hope that is okay!)

Suppose I have a DataFrame Y that looks like this:

index x1 x2 A
0 1 2 5
1 3 7 1

And then I have a parent DataFrame X like this:

index x1 x2 A
0 1 2 0
1 1 3 0
2 3 4 0
3 3 7 0

Further, suppose that any ['x1','x2'] combination that exists is unique (so any combination exists in either X or Y only 0 or 1 times, and if it exists in Y, then that particular ['x1','x2'] combination (although the 'A' value may be different) exists in X as well).

For all ['x1','x2'] combinations in Y, I would like to find the corresponding indices in X. So here, those indices would be a list [0,3] that I want.

My goal is, for all such rows in X with such an index (I'll call it j here), to set

X['A'].loc[j] = Y['A'].iloc[j]

Currently I have this:

for i in range(len(X)):
    v1 = X['x1'].iloc[i]
    v2 = X['x2'].iloc[i]
    extract = Y.query("x1 == %d" % v1).query("x2 == %d" % v2)
    if len(extract_ori) != 0:
        X['count'].loc[i] = extract['count'].iloc[0]

This does what I want, except it is pretty slow, and it seems like there should be a faster way to do this. Wondering what this might be, if it exists!

So the resulting DataFrame X should look like

index x1 x2 A
0 1 2 5
1 1 3 0
2 3 4 0
3 3 7 1

CodePudding user response:

One option is to use MultiIndex.map:

cols = ['x1','x2']
X['A'] = X.set_index(cols).index.map(Y.set_index(cols)['A']).fillna(0).astype(int)

Another option is left-merge on two columns:

cols = ['x1','x2']
X = X[cols].merge(Y[cols ['A']], on=cols, how='left').fillna(0)

Output:

   index  x1  x2  A
0      0   1   2  5
1      1   1   3  0
2      2   3   4  0
3      3   3   7  1

CodePudding user response:

You could set_index the x1 and x2 column and update the dataframe on the A column.

X1 = X.set_index(['x1', 'x2'])
Y1 = Y.set_index(['x1', 'x2'])

X1['A'].update(Y1['A']) # works inplace
X1.reset_index(inplace=True)
print(X1)

    x1  x2  index   A
0    1   2      0   5
1    1   3      1   0
2    3   4      2   0
3    3   7      3   1
  • Related