need help with pandas.
I have a function running through all the rows in a dataframe, where I have to check through the several 'raw data' columns and output a series of True/False "MCond" columns to verify different combination of conditions.
Then last step is to process all these True/False "Mcond" columns to output a final True/False "Result" column, then return the df.
Expected outcome: All other MCond columns have to be true, except Mcond27 and MCond31, where either can be true for the final "Result" column to be true.
e.g. If Mcond1 is False, then function1 returns False.
If Mcond1 to 5 is True, and Mcond27 is True while Mcond31 is False, Function1 returns True.
If Mcond1 to 5 is True, but both Mcond27 and Mcond31 is False, Function1 returns False.
If all Mcond is True, Function1 returns True.
How do I process this for Result? referring to the last line of my code before i return the df, I'm currently using a simple .all to capture all columns starting with "MCond" to check if they are all true.
Please advise!
def function1(df, tf):
df.loc[:, ('Mcond 1')] = df.A > df.A.shift(1)
df.loc[:, ('Mcond 2')] = df.B < df.B.shift(1)
df.loc[:, ('Mcond 3')] = df.C > df.D
df.loc[:, ('Mcond 4')] = df.E > df.E.shift(1)
df.loc[:, ('Mcond 5')] = df.F > df.F.shift(1)
if tf == 'A1':
df.loc[:, ('Mcond 27')] = df.consec5 == 1
elif tf == 'B1':
df.loc[:, ('Mcond 27')] = df.consec3 == 1
elif tf == 'C1':
df.loc[:, ('Mcond 27')] = True
else:
df.loc[:, ('Mcond 27')] = True
df.loc[:, ('Mcond 31')] = df.consec2.rolling(3).sum() >= 1
df.loc[:, ('Result')] = df.loc[:, df.columns.str.startswith('Mcond')].all(axis=1)
return df
CodePudding user response:
So, given the following dataframe:
import pandas as pd
df = pd.DataFrame(
{
"Mcond1": ["FALSE", "TRUE", "TRUE", "TRUE", "TRUE"],
"Mcond2": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
"Mcond3": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
"Mcond4": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
"Mcond5": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
"Mcond27": ["FALSE", "TRUE", "FALSE", "FALSE", "TRUE"],
"Mcond31": ["TRUE", "FALSE", "TRUE", "FALSE", "TRUE"],
}
)
You could try like this:
df["Result"] = (
(df["Mcond1"] != "FALSE")
& (df["Mcond2"] != "FALSE")
& (df["Mcond3"] != "FALSE")
& (df["Mcond4"] != "FALSE")
& (df["Mcond5"] != "FALSE")
& ((df["Mcond27"] != "FALSE") | (df["Mcond31"] != "FALSE"))
)
df["Result"] = [str(item).upper() for item in df["Result"].values]
print(df)
# Outputs
Mcond1 Mcond2 Mcond3 Mcond4 Mcond5 Mcond27 Mcond31 Result
0 FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE
1 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
2 TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
3 TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
CodePudding user response:
Assuming that you will have more than 5 condition columns, and some columns are exceptions:
import pandas as pd
df = pd.read_csv("data.csv")
exceptions = ['Mcond27', 'Mcond31']
for i, row in df.iterrows():
conds = []
conds_e = []
for j, col in enumerate(row.keys()):
if any(_ in col for _ in exceptions):
if col != 'Result':
conds_e.append(row[j])
else:
conds.append(row[j])
if all(conds) and any(conds_e):
df.loc[i, 'Result'] = True
else:
df.loc[i, 'Result'] = False
Output:
Mcond1 Mcond2 Mcond3 Mcond4 Mcond5 Mcond27 Mcond31 Result
0 False True True True True True True False
1 True True True True True True False True
2 True True True True True False True True
3 True True True True True False False False
4 True True True True True True True True
In terms of time, it tooks me 0.13 seconds to process a 300x100 DataFrame:
import pandas as pd
import time
import random
def get_results(df, exc):
for i, row in df.iterrows():
conds = []
conds_e = []
for j, col in enumerate(row.keys()):
if any(_ in col for _ in exc):
if col != 'Result':
conds_e.append(row[j])
else:
conds.append(row[j])
if all(conds) and any(conds_e):
df.loc[i, 'Result'] = True
else:
df.loc[i, 'Result'] = False
return df
if __name__ == '__main__':
df = pd.DataFrame()
for i in range(0, 100):
values = []
for j in range(0, 300):
values.append(bool(random.getrandbits(1)))
df['Mcond' str(i 1)] = values
exc = ['Mcond27', 'Mcond31']
start = time.time()
df = get_results(df, exc)
end = time.time() - start
Output:
Mcond1 Mcond2 Mcond3 Mcond4 ... Mcond98 Mcond99 Mcond100 Result
0 False True False True ... True False True False
1 True False True True ... True False False False
2 True False False True ... True True True False
3 False True False False ... True False False False
4 True True False True ... False False True False
.. ... ... ... ... ... ... ... ... ...
295 False False False True ... False False False False
296 False True False False ... False True True False
297 True False False True ... False False True False
298 False True False True ... True True True False
299 True False True False ... True True False False
[300 rows x 101 columns]
Time: 0.13100171089172363