I have a pandas DataFrame, each column represents a quarter, the most recent quarters are placed to the right, not all the information gets at the same time, some columns might be missing information
I would like to add at the end of the DataFrame another column called Criteria:
- If the 3 most recent quarters with available data each has a value greater than 10 then Criteria = Y (that means ignore NaN when applying the condition)
- Otherwise N
In the sample below, B, D and E would be equal to Y, the rest would be marked as N
But I can't find a way to apply the criteria dynamically to ignore the NaN values
import pandas as pd
import numpy as np
list_of_tuples = [
(11, 34, 78, 5, -11, -56),
(12, np.NaN, 98, 7, np.NaN, 18),
(21, 56, -78, -23, 64, 28),
(56, -98, 35, 63, 27, np.NaN),
(13, 34, -11, 11, 56, 10),
(12, 41, 12, 41, -78, -18),
]
df = pd.DataFrame(
list_of_tuples,
index=['A', 'B', 'C', 'D', 'E', 'F'],
columns=['2020Q4', '2021Q1', '2021Q2', '2021Q3', '2021Q4', '2022Q1'],
)
print(df)
Any help would be appreciated, thanks
CodePudding user response:
IIUC, you can stack
and use a custom groupby
function:
df['Criteria'] = np.where(df.stack().groupby(level=0)
.apply(lambda s: s.tail(3).ge(10).all()),
'Y', 'N')
NB. B is "N" as 2021Q3 is < 10
output:
2020Q4 2021Q1 2021Q2 2021Q3 2021Q4 2022Q1 Criteria
A 11 34.0 78 5 -11.0 -56.0 N
B 12 NaN 98 7 NaN 18.0 N
C 21 56.0 -78 -23 64.0 28.0 N
D 56 -98.0 35 63 27.0 NaN Y
E 13 34.0 -11 11 56.0 10.0 Y
F 12 41.0 12 41 -78.0 -18.0 N