Home > Enterprise >  Use fields of one dataframe as conditions to fill a field of another dataframe
Use fields of one dataframe as conditions to fill a field of another dataframe

Time:01-06

I have 2 dataframes, the first is a small dataframe (df1) with information to use to fill a field (named Flag) of the second dataframe (df2). I need to write a function that uses each row of df1 as parameters to fill each row of df2 with a certain value (Y or N).

df1 =

type q25 q75
A 13 98
B 381 500
C 34 103

df2 =

field1 field2 ... TYPE delta Flag
field1 field2 ... A 379 Y
field1 field2 ... C 90 N
field1 field2 ... A 50 N
field1 field2 ... B 2000 Y

I tried this code, but unfortunately it doesn't work because it overwrites the lines in df2 related to the cases of df1 with type equal to A or B, and it works for the last record of df2 (type equal to C)

def filling(x, row):
    IQR = row['q75'] - row['q25']
    if (x['Flag'] != 'Y'):
        if row['type'] == x['TYPE'] and (x['delta'] < row['q25'] - 1.5*IQR or x['delta'] > row['q75']   1.5*IQR):  
            return 'Y'
        else:
            return 'N'

for index, row in df2.iterrows():
    df1['Flag'] = df1.apply(lambda x : filling(x, row), axis=1) 

How could I fix it?

CodePudding user response:

If I got your question correctly, you missed two parenthesis:

def filling(x, row):
    IQR = row['q75'] - row['q25']
    if (x['Flag'] != 'Y'):
        if (row['type'] == x['TYPE'] and (x['delta'] < row['q25'] - 1.5*IQR) or (x['delta'] > row['q75']   1.5*IQR)): #here, you just needed to divide the expression
            return 'Y'
        else:
            return 'N'

for index, row in df2.iterrows():
    df1['Flag'] = df1.apply(lambda x : filling(x, row), axis=1) 

    field1  field2  TYPE    delta   Flag
0   field1  field2  A   379     Y
1   field1  field2  C   90      N
2   field1  field2  A   50      N
3   field1  field2  B   2000    Y

CodePudding user response:

From what I understand you would like to have a flag column which tell you whether the particular row is an outlier or not. Here is a vectorized and concise way to achieve that:

# Merge the dataframes on type column
s = df2.merge(df1, left_on='TYPE', right_on='type', how='left')

# calculate IQR and condition to check for outlier
s['IQR'] = s['q75'] - s['q25']
is_outlier = ~s['delta'].between(s['q25'] - 1.5 * s['IQR'], s['q25']   1.5 * s['IQR'])

# Use np.where to select Y/N based on the outlier condition
s['Flag'] = np.where(s['Flag'].ne('Y') & is_outlier, 'Y', s['Flag'])

# drop the columns from df1
s = s.drop(columns=df1.columns)

Result

print(s)

   field1  field2  ... TYPE  delta Flag  IQR
0  field1  field2  ...    A    379    Y   85
1  field1  field2  ...    C     90    N   69
2  field1  field2  ...    A     50    N   85
3  field1  field2  ...    B   2000    Y  119
  • Related