Home > front end >  Change field in row if other row has same value in different field
Change field in row if other row has same value in different field

Time:03-25

I'm looking for a Python-based solution to a common task I often run into.
(I know, there must be cool SQL-magic for this, but... not now.)

The situation is:
Given a table, we want to change the field f2 if oid is the same.
(Assumed that f2 always contains bar or is empty if oid is 42)

id f1 f2 oid
1 foo bar 42
2 foo 49
3 baz 42

Becomes:

id f1 f2 oid
1 foo bar 42
2 foo 49
3 baz bar 42

I'm currently using itertools.combinations to avoid repeated comparison.
This seems more efficient than a double for-loop.
But for very large datasets, it still becomes unhandy... considering the Big-O-Notation.

My question is: Can the problem be solved more elegantly via pandas or numpy?

CodePudding user response:

Assumed that f2 always contains bar or is empty if oid is 42

Maybe we could use groupby transform first as well:

df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].transform('first').fillna('')

Output:

   id   f1   f2  oid
0   1  foo  bar   42
1   2  foo        49
2   3  baz  bar   42

CodePudding user response:

The only solution I can think of would be to use groupby ffill:

df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].ffill()

Output:

>>> df
   id   f1   f2  oid
0   1  foo  bar   42
1   2  foo  NaN   49
2   3  baz  bar   42
  • Related