I have a little snippet of code that I need to modify and I'm not finding exactly why np.mean() works where np.min() doesn't in the specific situation when a pandas column is composed of nested lists. Maybe someone here could clarify?
This snippet here works perfectly:
import pandas as pd
import numpy as np
def transformation(custom_df):
dic = dict(zip(custom_df['customers'], custom_df['values']))
custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
custom_df['neighbors'].apply(
lambda row: np.mean([dic[v] for v in row if dic.get(v)])),
custom_df['values'])
return custom_df
customers = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6], [3], [], [3, 5], [6], [5]]
vn = [1, 1, 0, 2, 1, 1]
df2 = pd.DataFrame({'customers': customers, 'values': values, 'neighbors': neighbors, 'valid_neighbors': vn})
customers values neighbors valid_neighbors
0 1 NaN [6] 1
1 2 NaN [3] 1
2 3 10.0 [] 0
3 4 NaN [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
df2 = transformation(df2)
The result:
customers values neighbors valid_neighbors
0 1 12.0 [6] 1
1 2 10.0 [3] 1
2 3 10.0 [] 0
3 4 10.5 [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
However, if I were to change, on the "transformation()" function, np.mean() to np.min(), it would return a ValueError, making me wonder why it doesn't happen when I call the np.mean() function:
ValueError: zero-size array to reduction operation minimum which has no identity
I would like to know which conditions I'm not fulfilling, and what can I do to get the expected result, which would be:
customers values neighbors valid_neighbors
0 1 12.0 [6] 1
1 2 10.0 [3] 1
2 3 10.0 [] 0
3 4 10.0 [3, 5] 2
4 5 11.0 [6] 1
5 6 12.0 [5] 1
CodePudding user response:
There is an empty list in your neighbors
column which would throw error for np.min
but where as np.mean
works even for empty list.
import numpy as np
print(np.mean([]))
# Output
# nan
print(np.min([]))
# Throws error
# ValueError: zero-size array to reduction operation minimum which has no identity
CodePudding user response:
use following code and get result:
df3 = df2.set_index('customers')
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].mean()))
output(mean):
0 12.00
1 10.00
2 10.00
3 10.50
4 11.00
5 12.00
Name: values, dtype: float64
you can change mean
to min
:
df2['values'].fillna(df2['neighbors'].apply(lambda x: df3.loc[x, 'values'].min()))
output(min):
0 12.00
1 10.00
2 10.00
3 10.00
4 11.00
5 12.00
Name: values, dtype: float64
make desired result to value
column
CodePudding user response:
It's better you update your transformation
function with adjustment for empty array in neighbors
column.
Here's a workaround that may work.
def transformation(custom_df):
dic = dict(zip(custom_df['customers'], custom_df['values']))
custom_df['values'] = np.where(custom_df['values'].isna() & (custom_df['valid_neighbors'] >= 1),
custom_df['neighbors'].apply(
lambda row: np.min([dic[v] for v in row if dic.get(v)]) if len(row) else 0),
custom_df['values'])
return custom_df