How to get the minimum value from a nested-list-column on Pandas? Why numpy.min() doesn't work-CodePudding

I have a little snippet of code that I need to modify and I'm not finding exactly why np.mean() works where np.min() doesn't in the specific situation when a pandas column is composed of nested lists. Maybe someone here could clarify?

This snippet here works perfectly:

import pandas as pd
import numpy as np


def transformation(custom_df):
    dic = dict(zip(custom_df['customers'], custom_df['values']))
    custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
                                   custom_df['neighbors'].apply(
                                       lambda row: np.mean([dic[v] for v in row if dic.get(v)])),
                                   custom_df['values'])
    return custom_df


customers = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6], [3], [], [3, 5], [6], [5]]
vn = [1, 1, 0, 2, 1, 1]
df2 = pd.DataFrame({'customers': customers, 'values': values, 'neighbors': neighbors, 'valid_neighbors': vn})


   customers  values neighbors  valid_neighbors
0          1     NaN       [6]                1
1          2     NaN       [3]                1
2          3    10.0        []                0
3          4     NaN    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

df2 = transformation(df2)

The result:

   customers  values neighbors  valid_neighbors
0          1    12.0       [6]                1
1          2    10.0       [3]                1
2          3    10.0        []                0
3          4    10.5    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

However, if I were to change, on the "transformation()" function, np.mean() to np.min(), it would return a ValueError, making me wonder why it doesn't happen when I call the np.mean() function:

ValueError: zero-size array to reduction operation minimum which has no identity

I would like to know which conditions I'm not fulfilling, and what can I do to get the expected result, which would be:

   customers  values neighbors  valid_neighbors
0          1    12.0       [6]                1
1          2    10.0       [3]                1
2          3    10.0        []                0
3          4    10.0    [3, 5]                2
4          5    11.0       [6]                1
5          6    12.0       [5]                1

CodePudding user response：

There is an empty list in your neighbors column which would throw error for np.min but where as np.mean works even for empty list.

import numpy as np

print(np.mean([])) 
# Output
# nan

print(np.min([])) 
# Throws error
# ValueError: zero-size array to reduction operation minimum which has no identity

CodePudding user response：

use following code and get result:

df3 = df2.set_index('customers')
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].mean()))

output(mean):

0   12.00
1   10.00
2   10.00
3   10.50
4   11.00
5   12.00
Name: values, dtype: float64

you can change mean to min:

df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].min()))

output(min):

0   12.00
1   10.00
2   10.00
3   10.00
4   11.00
5   12.00
Name: values, dtype: float64

make desired result to value column

CodePudding user response：

It's better you update your transformation function with adjustment for empty array in neighbors column. Here's a workaround that may work.

def transformation(custom_df):
    dic = dict(zip(custom_df['customers'], custom_df['values']))
    custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
                                   custom_df['neighbors'].apply(
                                       lambda row: np.min([dic[v] for v in row if dic.get(v)]) if len(row) else 0),
                                   custom_df['values'])
    return custom_df