I have the below dataframe:
PARAM1 | PARAM2 | VALUE |
---|---|---|
A | X | TUE, WED |
A | Y | NO |
B | X | MON, WED |
B | Y | YES |
I would like a pythonic way of obtaining the distinct values of param1 that satisfy EITHER of these conditions:
- Their corresponding param2 = 'X' contains the string 'MON'
- Their corresponding param2 = 'Y' is equal to 'YES'.
In the example above, the output would be just B, because.
PARAM1 | PARAM2 | VALUE | EXPLANATION |
---|---|---|---|
A | X | TUE, WED | X parameter does not contain 'MON', so does not count for A. |
A | Y | NO | Y parameter is not equal to 'YES', so does not count for A. |
B | X | MON, WED | X parameter contains 'MON', so it counts for B. |
B | Y | YES | Y parameter is equal to 'YES', so it counts for B. |
Since A has not met either of the criteria for param2 X and Y, it's not in the output. B has fulfilled both (would have been enough with just one), so it's in the output.
In Oracle I would do it this way, but not sure how to proceed in python:
SELECT DISTINCT
param1
FROM
(
-- Fetch the X entries having a 'MON' in value
SELECT
param1
FROM
aux
WHERE
param2 = 'X'
AND REGEXP_LIKE ( value,
'MON' )
UNION ALL
-- Fetch the Y entries having value equal to 'YES'
SELECT
param1
FROM
aux
WHERE
param2 = 'Y'
AND param3 = 'YES'
);
CodePudding user response:
First, we form a boolean mask based on the condition, then select the corresponding rows from the dataframe:
cond = ((df['PARAM2'] == 'X') & df['VALUE'].str.contains('MON')) | \
((df['PARAM2'] == 'Y') & df['VALUE'].str.contains('YES'))
df = df[cond]
print(df)
Prints:
PARAM1 PARAM2 VALUE
2 B X MON, WED
3 B Y YES