I'm looking for a Python-based solution to a common task I often run into.
(I know, there must be cool SQL-magic for this, but... not now.)
The situation is:
Given a table, we want to change the field f2
if oid
is the same.
(Assumed that f2
always contains bar
or is empty if oid
is 42
)
id | f1 | f2 | oid |
---|---|---|---|
1 | foo | bar | 42 |
2 | foo | 49 | |
3 | baz | 42 |
Becomes:
id | f1 | f2 | oid |
---|---|---|---|
1 | foo | bar | 42 |
2 | foo | 49 | |
3 | baz | bar | 42 |
I'm currently using itertools.combinations
to avoid repeated comparison.
This seems more efficient than a double for-loop.
But for very large datasets, it still becomes unhandy... considering the Big-O-Notation.
My question is: Can the problem be solved more elegantly via pandas or numpy?
CodePudding user response:
Assumed that f2 always contains bar or is empty if oid is 42
Maybe we could use groupby
transform first
as well:
df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].transform('first').fillna('')
Output:
id f1 f2 oid
0 1 foo bar 42
1 2 foo 49
2 3 baz bar 42
CodePudding user response:
The only solution I can think of would be to use groupby
ffill
:
df['f2'] = df.replace('', np.nan).groupby('oid')['f2'].ffill()
Output:
>>> df
id f1 f2 oid
0 1 foo bar 42
1 2 foo NaN 49
2 3 baz bar 42