Home > Blockchain >  Check if a column value is in a list and report to a new column
Check if a column value is in a list and report to a new column

Time:05-17

After this discussion, I have the following dataframe:

data = {'Item':['1', '2', '3', '4', '5'], 
'Len':[142, 11, 50, 60, 12], 
'Hei':[55, 65, 130, 14, 69],
'C':[68, -18, 65, 16, 17],
'Thick':[60, 0, -150, 170, 130],
'Vol':[230, 200, -500, 10, 160]
'Fail':[['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], ""}

df = pd.DataFrame(data)

representing different items and the corresponding values related to some of their parameters (Le, Hei, C, ...). In the column Fail are reported the parameters that are failed, e. g. item 1 fails for parameters Len and Thick, item 3 fails for parameters B, Thick and Vol, while item 4 shows no failure. For each item I need a new column where it is reported the failed parameter together with its value, in the following format: failed parameter = value. So, for the first item I should get Len=142 and Thick=60. So far, I have exploded the Fail column into multiple columns:

failed_param = df['Fail'].apply(pd.Series)
failed_param = failed_param.rename(columns = lambda x : 'Failed_param_'   str(x  1 ))
df2_list = failed_param.columns.values.tolist()
df2 = pd.concat([df[:], failed_param[:]], axis=1)

Then, if I do the following:

for name in df2_list:
    df2.loc[df2[f"{name}"] == "D", "new"] = "D"  "="   df2["D"].map(str)

I can get what I need but for only one parameter (D in this case). How can I obtain the same for all the parameters all at once?

CodePudding user response:

As mentioned in the question, you need to insert a new column (e.g., FailParams) that contains a list of strings. Each string represents the items' failures (e.g., Len=142,Thick=60). A quick solution can be:

import pandas as pd

data = {
  'Item' : ['1', '2', '3', '4', '5'],
  'Len'  : [142, 11, 50, 60, 12],
  'Hei'  : [55, 65, 130, 14, 69],
  'C'    : [68, -18, 65, 16, 17],
  'Thick': [60, 0, -150, 170, 130],
  'Vol'  : [230, 200, -500, 10, 160],
  'Fail' : [['Len', 'Thick'], ['Thick'], ['Hei', 'Thick', 'Vol'], ['Vol'], []]
}

# Convert the dictionary into a DataFrame.
df = pd.DataFrame(data)

# The first solution: using list comprehension.
column = [
  ",".join(  # Add commas between the list items.
    # Find the target items and their values.
    [el   "="   str(df.loc[int(L[0]) - 1, el]) for el in L[1]]
  )
  if (len(L[1]) > 0) else ""  # If the Fail inner is empty, return an empty string.
  for L in zip(df['Item'].values, df['Fail'].values)  # Loop on the Fail items.
]

# Insert the new column.
df['FailParams'] = column

# Print the DF after insertion.
print(df)

The previous solution is added using list comprehension. Another solution using loops can be:

# The second solution: using loops.
records = []
for L in zip(df['Item'].values, df['Fail'].values):
  if (len(L[1]) <= 0):
    record = ""
  else:
    record = ",".join([el   "="   str(df.loc[int(L[0]) - 1, el]) for el in L[1]])
  records.append(record)
print(records)

# Insert the new column.
df['FailParams'] = records

# Print the DF after insertion.
print(df)

A sample output should be:

  Item  Len  Hei   C  Thick  Vol               Fail                   FailParams
0    1  142   55  68     60  230       [Len, Thick]             Len=142,Thick=60
1    2   11   65 -18      0  200            [Thick]                      Thick=0
2    3   50  130  65   -150 -500  [Hei, Thick, Vol]  Hei=130,Thick=-150,Vol=-500
3    4   60   14  16    170   10              [Vol]                       Vol=10
4    5   12   69  17    130  160                 []

CodePudding user response:

It might be a good idea to build an intermediate representation first, something like this (I am assuming the empty cell in the Fail column is an empty list [] so as to match the datatype of the other values):

# create a Boolean mask to filter failed values
m = df.apply(lambda row: row.index.isin(row.Fail), 
             axis=1, 
             result_type='broadcast')

>>> df[m]
  Item    Len    Hei   C  Thick    Vol Fail
0  NaN  142.0    NaN NaN   60.0    NaN  NaN
1  NaN    NaN    NaN NaN    0.0    NaN  NaN
2  NaN    NaN  130.0 NaN -150.0 -500.0  NaN
3  NaN    NaN    NaN NaN    NaN   10.0  NaN
4  NaN    NaN    NaN NaN    NaN    NaN  NaN

This allows you to actually do something with the failed values, too.

With that in place, generating the value list could be done by something similar to Hossam Magdy Balaha's answer, perhaps with a little function:

def join_params(row):
    row = row.dropna().to_dict()
    return ', '.join(f'{k}={v}' for k,v in row.items())

>>> df[m].apply(join_params, axis=1)
0                  Len=142.0, Thick=60.0
1                              Thick=0.0
2    Hei=130.0, Thick=-150.0, Vol=-500.0
3                               Vol=10.0
4                                       
dtype: object
  • Related