After this discussion, I have the following dataframe:
data = {'Item':['1', '2', '3', '4', '5'],
'Len':[142, 11, 50, 60, 12],
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}
df = pd.DataFrame(data)
representing different items and the corresponding values related to some of their parameters (Le, Hei, C, ...). In the column Fail are reported the parameters that are failed, e. g. item 1 fails for parameters Len and Thick, item 3 fails for parameters B, Thick and Vol, while item 4 shows no failure. For each item I need a new column where it is reported the failed parameter together with its value, in the following format: failed parameter = value. So, for the first item I should get Len=142 and Thick=60. So far, I have exploded the Fail column into multiple columns:
failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_' str(x 1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)
Then, if I do the following:
for name in df2_list:
df2.loc[df2[f"{name}"] == "D", "new"] = "D" "=" df2["D"].map(str)
I can get what I need but for only one parameter (D in this case). How can I obtain the same for all the parameters all at once?
CodePudding user response:
As mentioned in the question, you need to insert a new column (e.g., FailParams
) that contains a list of strings. Each string represents the items' failures (e.g., Len=142,Thick=60
). A quick solution can be:
import pandas as pd
data = {
'Item' : ['1', '2', '3', '4', '5'],
'Len' : [142, 11, 50, 60, 12],
'Hei' : [55, 65, 130, 14, 69],
'C' : [68, -18, 65, 16, 17],
'Thick': [60, 0, -150, 170, 130],
'Vol' : [230, 200, -500, 10, 160],
'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}
# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)
# The first solution: using list comprehension.
column = [
",".join( # Add commas between the list items.
# Find the target items and their values.
[el "=" str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
)
if (len(L[1]) > 0) else "" # If the Fail inner is empty, return an empty string.
for L in zip(df['Item'].values, df['Fail'].values) # Loop on the Fail items.
]
# Insert the new column.
df['FailParams'] = column
# Print the DF after insertion.
print(df)
The previous solution is added using list comprehension. Another solution using loops can be:
# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
if (len(L[1]) <= 0):
record = ""
else:
record = ",".join([el "=" str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
records.append(record)
print(records)
# Insert the new column.
df['FailParams'] = records
# Print the DF after insertion.
print(df)
A sample output should be:
Item Len Hei C Thick Vol Fail FailParams
0 1 142 55 68 60 230 [Len, Thick] Len=142,Thick=60
1 2 11 65 -18 0 200 [Thick] Thick=0
2 3 50 130 65 -150 -500 [Hei, Thick, Vol] Hei=130,Thick=-150,Vol=-500
3 4 60 14 16 170 10 [Vol] Vol=10
4 5 12 69 17 130 160 []
CodePudding user response:
It might be a good idea to build an intermediate representation first, something like this (I am assuming the empty cell in the Fail
column is an empty list []
so as to match the datatype of the other values):
# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail),
axis=1,
result_type='broadcast')
>>> df[m]
Item Len Hei C Thick Vol Fail
0 NaN 142.0 NaN NaN 60.0 NaN NaN
1 NaN NaN NaN NaN 0.0 NaN NaN
2 NaN NaN 130.0 NaN -150.0 -500.0 NaN
3 NaN NaN NaN NaN NaN 10.0 NaN
4 NaN NaN NaN NaN NaN NaN NaN
This allows you to actually do something with the failed values, too.
With that in place, generating the value list could be done by something similar to Hossam Magdy Balaha's answer, perhaps with a little function:
def join_params(row):
row = row.dropna().to_dict()
return ', '.join(f'{k}={v}' for k,v in row.items())
>>> df[m].apply(join_params, axis=1)
0 Len=142.0, Thick=60.0
1 Thick=0.0
2 Hei=130.0, Thick=-150.0, Vol=-500.0
3 Vol=10.0
4
dtype: object