i am New to Python, any lead how to proceed would be very much helpful. thank you in advance I have a dataframe like this as input
sec pos J NJ SJ Key order
S 5 202301201413 3
5 1 0 1 202301201413 4
0 202301201413 5
0 1 202301201413 6
E 5 1 1 1 202301201413 7
N 0 202301201413 8
S 3 1 1 202301201413 11
1 202301201413 12
E 3 1 0 1 202301201413 13
S 4 1 1 202301201425 1
E 4 1 1 0 202301201425 2
S 2 0 1 1 202301201425 4
E 2 0 202301201425 5
S 5 1 1 1 202301201425 7
0 202301201425 8
E 5 1 1 202301201425 9
N 1 1 1 202301201436 1
S 2 1 202301201436 2
1 1 202301201436 3
E 2 1 1 1 202301201436 4
S 3 1 202301201436 5
E 3 1 1 202301201436 6
description about the above table
- Sec column has 'S' - start and 'E' - end of testing
- Pos coumnn has position for which testing is done
- J, NJ, SJ columns have output for different parameter
- Key column has ID for which testing is done.
0 is Fault in above table and 1 is Good
I need to transform this data some thing like below table
Key J NJ SJ N 2 3 4 5
202301201413 1 0 1 2 - 2 - 2
202301201425 0 1 0 - 1 - 3 1
202301201436 1 1 1 0 0 0 - -
if there is any 0 for the key in J, NJ, SJ in input table it has to reflected here in output table column J, NJ, SJ
Column [5~10] i.e {N,1,2,3,4,5} are pos values in input table which need to be mapped with values based on below condition
NO 0 and atleast 1 is available 0 J is 0 1 NJ is 0 2 SJ is 0 3 J NJ is 0 4 NJ SJ is 0 5 SJ J is 0 6 ALL are 0 7 ALL Null 8
CodePudding user response:
You could do something like the following:
def format_result(ser):
if np.all(np.isnan(ser)):
return np.nan
if np.any(ser == 0):
return 0
else:
return 1
def level_result(ser):
if np.all(np.isnan(ser)):
return 8
elif np.all(ser != 0):
return 0
J, NJ, SJ = ser[["J", "NJ", "SJ"]] == 0
idx = J 2* NJ 4* SJ
idx = {3:4, 4:3, 5:6, 6:5}.get(idx, idx)
return idx
other_columns = np.sort(df.pos.dropna().unique())
(
df
.groupby(["Key", "pos"])
.agg({
"J": format_result,
"NJ": format_result,
"SJ": format_result,
})
.assign(result=lambda df: df.apply(level_result, axis=1))
.reset_index()
.set_index(["Key", "J", "NJ", "SJ", "pos"])
.unstack(level=-1)
.droplevel(0, axis=1)
.reset_index()
.groupby("Key").agg({
"J": format_result,
"NJ": format_result,
"SJ": format_result,
**{k: "max" for k in other_columns}
})
)
| Key | J | NJ | SJ | 2 | 3 | 4 | 5 | N |
|-------------:|----:|-----:|-----:|----:|----:|----:|----:|----:|
| 202301201413 | 1 | 0 | 1 | nan | 2 | nan | 2 | 2 |
| 202301201425 | 0 | 1 | 0 | 1 | nan | 3 | 0 | nan |
| 202301201436 | 1 | 1 | 1 | 0 | 0 | nan | nan | 0 |
It's admittedly not the most performant, but it's late where I am located and I can't think of a shorter solution just now.
Also note that your column with index [202301201425, 5]
is different compared to your example. For this group, the only values are [1, nan]
in which case you would expect 0 according to your formula; however your example has a 1
there.