Home > Mobile >  Count entries of mutliple dataframe columns outside values from second dataframe rows
Count entries of mutliple dataframe columns outside values from second dataframe rows

Time:01-19

I'm having two dataframes dfSpecs and dfData. dfSpecs contains the spec limits for parameter columns in dfData. (LSL = lower spec limit, USL = upper spec limit) The parameter rows in dfSpecs are not necessarily in the same order as the parameter columns in dfData.

dfSpecs = df = pd.DataFrame(
  {'param': ['parameter1', 'parameter2', 'parameter3'], 
  'LSL': [0, 10, 200], 
  'USL': [1, 20, 300]}
)
print(dfSpecs)
Output:
   param       LSL  USL
0  parameter1    0    1
1  parameter2   10   20
2  parameter3  200  300
dfData = df = pd.DataFrame(
  {'id': ['foo', 'foo', 'bar', 'bar'], 
  'x': [1, 1, 1, 1], 
  'y': [1, 2, 1, 2],
  'parameter1': [0.5, 0.6, 1.8, 0.4],
  'parameter2': [12, 14, 21, 15],
  'parameter3': [199, 301, 256, 234]}
)
print(dfData)
Output:
    id  x  y  parameter1  parameter2  parameter3
0  foo  1  1         0.5          12         199
1  foo  1  2         0.6          14         301
2  bar  1  1         1.8          21         256
3  bar  1  2         0.4          15         234

Now I want to get a new dataframe containing the count of rows outside the spec limits for parameter and id.

The expected output looks as follows:

    id       param  fail_count
0  foo  parameter1           0
1  foo  parameter2           0
2  foo  parameter3           2
3  bar  parameter1           1
4  bar  parameter2           1
5  bar  parameter3           0

The following code works, but is very slow (up to several dozens of minutes, depending on the dataframe size):

fails=[]
for id, df in dfData.groupby('id'):
    for i, spec in dfSpecs.iterrows():
        fail_count = len(df[spec['param']].loc[(df[spec['param']]<spec['LSL']) | (df[spec['param']]>spec['USL'])])
        fails.append(id, [spec['param'], fail_count])
dfFails = pd.DataFrame(fails, columns=['id','parameter','fail_count'])

Is there a simpler, faster way how to achieve this? I suppose there is better ways in pandas.

Edit: rephrasing and adding example output

CodePudding user response:

Thank you for the response, based on your edits I would suggest the following:

# Create an empty dataframe with the columns 'id', 'param', and 'fail_count'
dfResults = pd.DataFrame(columns=['id', 'param', 'fail_count'])

# Get a list of unique ids in the 'dfData' dataframe
unique_ids = dfData['id'].unique()

# Iterate through the unique ids
for id in unique_ids:
    # Filter the 'dfData' dataframe for the current id
    df_id = dfData[dfData['id'] == id]
    
    # Get a list of unique parameters in the filtered dataframe
    unique_params = df_id.columns.difference(['id', 'x', 'y'])
    
    # Iterate through the unique parameters
    for param in unique_params:
        # Get the LSL and USL values for the current parameter
        LSL = dfSpecs[dfSpecs['param'] == param]['LSL'].values[0]
        USL = dfSpecs[dfSpecs['param'] == param]['USL'].values[0]
        
        # Calculate the number of failures for the current parameter and id
        fail_count = df_id[(df_id[param] < LSL) | (df_id[param] > USL)].shape[0]
        
        # Append the current id, parameter, and fail_count to the 'dfResults' dataframe
        dfResults = dfResults.append({'id': id, 'param': param, 'fail_count': fail_count}, ignore_index=True)

CodePudding user response:

First you can melt the dfData dataframe and then use dfSpecs as a look up for left, right to use with pd.Series.between, in a transform and assign that as a new column fail_count with assign and then groupby.sum() with id, param to get count.

specs = dfSpecs.set_index("param")
t = dfData.melt(
    value_vars=["parameter1", "parameter2", "parameter3"],
    id_vars=["id"],
    var_name="param"
)
out = t.assign(
    fail_count=t.groupby(["param"])["value"].transform(
        lambda x: ~x.between(specs.loc[x.name, "LSL"], specs.loc[x.name, "USL"])
    )
).groupby(["id", "param"])["fail_count"].sum().reset_index()

print(out)

    id       param  fail_count
0  bar  parameter1           1
1  bar  parameter2           1
2  bar  parameter3           0
3  foo  parameter1           0
4  foo  parameter2           0
5  foo  parameter3           2
  • Related