Home > Blockchain >  For python pandas, how to return either 2 conditions be True while the rest of Columns must be True?
For python pandas, how to return either 2 conditions be True while the rest of Columns must be True?

Time:11-28

need help with pandas.

I have a function running through all the rows in a dataframe, where I have to check through the several 'raw data' columns and output a series of True/False "MCond" columns to verify different combination of conditions.

Then last step is to process all these True/False "Mcond" columns to output a final True/False "Result" column, then return the df.

Expected outcome: All other MCond columns have to be true, except Mcond27 and MCond31, where either can be true for the final "Result" column to be true.

e.g. If Mcond1 is False, then function1 returns False.

If Mcond1 to 5 is True, and Mcond27 is True while Mcond31 is False, Function1 returns True.

If Mcond1 to 5 is True, but both Mcond27 and Mcond31 is False, Function1 returns False.

If all Mcond is True, Function1 returns True.

enter image description here

How do I process this for Result? referring to the last line of my code before i return the df, I'm currently using a simple .all to capture all columns starting with "MCond" to check if they are all true.

Please advise!

def function1(df, tf):
    
    df.loc[:, ('Mcond 1')] = df.A > df.A.shift(1)
    df.loc[:, ('Mcond 2')] = df.B < df.B.shift(1)
    df.loc[:, ('Mcond 3')] = df.C > df.D
    df.loc[:, ('Mcond 4')] = df.E > df.E.shift(1)
    df.loc[:, ('Mcond 5')] = df.F > df.F.shift(1)
    if tf == 'A1':
        df.loc[:, ('Mcond 27')] = df.consec5 == 1
    elif tf == 'B1':
        df.loc[:, ('Mcond 27')] = df.consec3 == 1
    elif tf == 'C1':
        df.loc[:, ('Mcond 27')] = True
    else:
        df.loc[:, ('Mcond 27')] = True
    df.loc[:, ('Mcond 31')] = df.consec2.rolling(3).sum() >= 1
    
    df.loc[:, ('Result')] = df.loc[:, df.columns.str.startswith('Mcond')].all(axis=1)

    return df

CodePudding user response:

So, given the following dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "Mcond1": ["FALSE", "TRUE", "TRUE", "TRUE", "TRUE"],
        "Mcond2": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
        "Mcond3": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
        "Mcond4": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
        "Mcond5": ["TRUE", "TRUE", "TRUE", "TRUE", "TRUE"],
        "Mcond27": ["FALSE", "TRUE", "FALSE", "FALSE", "TRUE"],
        "Mcond31": ["TRUE", "FALSE", "TRUE", "FALSE", "TRUE"],
    }
)

You could try like this:

df["Result"] = (
    (df["Mcond1"] != "FALSE")
    & (df["Mcond2"] != "FALSE")
    & (df["Mcond3"] != "FALSE")
    & (df["Mcond4"] != "FALSE")
    & (df["Mcond5"] != "FALSE")
    & ((df["Mcond27"] != "FALSE") | (df["Mcond31"] != "FALSE"))
)

df["Result"] = [str(item).upper() for item in df["Result"].values]

print(df)
# Outputs
  Mcond1 Mcond2 Mcond3 Mcond4 Mcond5 Mcond27 Mcond31 Result
0  FALSE   TRUE   TRUE   TRUE   TRUE   FALSE    TRUE  FALSE
1   TRUE   TRUE   TRUE   TRUE   TRUE    TRUE   FALSE   TRUE
2   TRUE   TRUE   TRUE   TRUE   TRUE   FALSE    TRUE   TRUE
3   TRUE   TRUE   TRUE   TRUE   TRUE   FALSE   FALSE  FALSE
4   TRUE   TRUE   TRUE   TRUE   TRUE    TRUE    TRUE   TRUE

CodePudding user response:

Assuming that you will have more than 5 condition columns, and some columns are exceptions:

import pandas as pd

df = pd.read_csv("data.csv")
exceptions = ['Mcond27', 'Mcond31']

for i, row in df.iterrows():
    conds = []
    conds_e = []
    for j, col in enumerate(row.keys()):
        if any(_ in col for _ in exceptions):
            if col != 'Result':
                conds_e.append(row[j])
        else:
            conds.append(row[j])
            
    if all(conds) and any(conds_e):
        df.loc[i, 'Result'] = True
    else:
        df.loc[i, 'Result'] = False

Output:

    Mcond1  Mcond2  Mcond3  Mcond4  Mcond5  Mcond27  Mcond31 Result
0   False    True    True    True    True     True     True  False
1    True    True    True    True    True     True    False   True
2    True    True    True    True    True    False     True   True
3    True    True    True    True    True    False    False  False
4    True    True    True    True    True     True     True   True

In terms of time, it tooks me 0.13 seconds to process a 300x100 DataFrame:

import pandas as pd
import time
import random

def get_results(df, exc):
    for i, row in df.iterrows():
        conds = []
        conds_e = []
        for j, col in enumerate(row.keys()):
            if any(_ in col for _ in exc):
                if col != 'Result':
                    conds_e.append(row[j])
            else:
                conds.append(row[j])
                
        if all(conds) and any(conds_e):
            df.loc[i, 'Result'] = True
        else:
            df.loc[i, 'Result'] = False
    return df
    
if __name__ == '__main__':
    
    df = pd.DataFrame()
    for i in range(0, 100):
        values = []
        for j in range(0, 300):
            values.append(bool(random.getrandbits(1)))
        df['Mcond'   str(i   1)] = values
    
    exc = ['Mcond27', 'Mcond31']
    
    start = time.time()
    df = get_results(df, exc)
    end = time.time() - start

Output:

      Mcond1  Mcond2  Mcond3  Mcond4  ...  Mcond98  Mcond99  Mcond100  Result
0     False    True   False    True  ...     True    False      True   False
1      True   False    True    True  ...     True    False     False   False
2      True   False   False    True  ...     True     True      True   False
3     False    True   False   False  ...     True    False     False   False
4      True    True   False    True  ...    False    False      True   False
..      ...     ...     ...     ...  ...      ...      ...       ...     ...
295   False   False   False    True  ...    False    False     False   False
296   False    True   False   False  ...    False     True      True   False
297    True   False   False    True  ...    False    False      True   False
298   False    True   False    True  ...     True     True      True   False
299    True   False    True   False  ...     True     True     False   False

[300 rows x 101 columns]

Time:  0.13100171089172363
  • Related