Derive Python Dataframe based on condition on Multiple columns-CodePudding

i am New to Python, any lead how to proceed would be very much helpful. thank you in advance I have a dataframe like this as input

sec pos J   NJ  SJ  Key           order 
S   5               202301201413    3
    5   1   0   1   202301201413    4
            0       202301201413    5
            0   1   202301201413    6
E   5   1   1   1   202301201413    7
    N       0       202301201413    8
S   3       1   1   202301201413    11
                1   202301201413    12
E   3   1   0   1   202301201413    13
S   4       1   1   202301201425    1
E   4   1   1   0   202301201425    2
S   2   0   1   1   202301201425    4
E   2   0           202301201425    5
S   5   1   1   1   202301201425    7
        0           202301201425    8
E   5   1   1       202301201425    9
    N   1   1   1   202301201436    1
S   2       1       202301201436    2
            1   1   202301201436    3
E   2   1   1   1   202301201436    4
S   3       1       202301201436    5
E   3   1   1       202301201436    6

description about the above table

Sec column has 'S' - start and 'E' - end of testing
Pos coumnn has position for which testing is done
J, NJ, SJ columns have output for different parameter
Key column has ID for which testing is done.

0 is Fault in above table and 1 is Good

I need to transform this data some thing like below table

Key             J   NJ  SJ  N   2   3   4   5
202301201413    1   0   1   2   -   2   -   2
202301201425    0   1   0   -   1   -   3   1
202301201436    1   1   1   0   0   0   -   -

if there is any 0 for the key in J, NJ, SJ in input table it has to reflected here in output table column J, NJ, SJ

Column [5~10] i.e {N,1,2,3,4,5} are pos values in input table which need to be mapped with values based on below condition

NO 0 and atleast 1 is available      0

 J is 0                              1
 NJ  is 0                            2
 SJ  is 0                            3
 J NJ  is 0                          4
 NJ SJ  is 0                         5
 SJ J  is 0                          6
 ALL are 0                           7
 ALL Null                            8

CodePudding user response：

You could do something like the following:

def format_result(ser):
    if np.all(np.isnan(ser)):
        return np.nan

    if np.any(ser == 0):
        return 0
    else:
        return 1

def level_result(ser):
    if np.all(np.isnan(ser)):
        return 8
    elif np.all(ser != 0):
        return 0
    
    J, NJ, SJ = ser[["J", "NJ", "SJ"]] == 0
    idx = J   2* NJ   4* SJ
    idx = {3:4, 4:3, 5:6, 6:5}.get(idx, idx)

    return idx

other_columns = np.sort(df.pos.dropna().unique())

(
    df
    .groupby(["Key", "pos"])
    .agg({
        "J": format_result,
        "NJ": format_result,
        "SJ": format_result,
    })
    .assign(result=lambda df: df.apply(level_result, axis=1))
    .reset_index()
    .set_index(["Key", "J", "NJ", "SJ", "pos"])
    .unstack(level=-1)
    .droplevel(0, axis=1)
    .reset_index()
    .groupby("Key").agg({
        "J": format_result,
        "NJ": format_result,
        "SJ": format_result,
        **{k: "max" for k in other_columns}
    })
)

|          Key |   J |   NJ |   SJ |   2 |   3 |   4 |   5 |   N |
|-------------:|----:|-----:|-----:|----:|----:|----:|----:|----:|
| 202301201413 |   1 |    0 |    1 | nan |   2 | nan |   2 |   2 |
| 202301201425 |   0 |    1 |    0 |   1 | nan |   3 |   0 | nan |
| 202301201436 |   1 |    1 |    1 |   0 |   0 | nan | nan |   0 |

It's admittedly not the most performant, but it's late where I am located and I can't think of a shorter solution just now.

Also note that your column with index [202301201425, 5] is different compared to your example. For this group, the only values are [1, nan] in which case you would expect 0 according to your formula; however your example has a 1 there.