Home > OS >  How to drop values from lists inside columns from a Pandas DataFrame
How to drop values from lists inside columns from a Pandas DataFrame

Time:09-22

Although not good coding practice, I've come to an special kind of problem, in which I need to go through a column of lists to erase particular values. I suppose one resolution could be managed with melting the 'neighbors' column, but I believe the code I've managed is close from the objective. I've prepared a reproducible example for better understanding:

import pandas as pd
import numpy as np


def removing_nan_neighboors(custom_df):
    nan_list = list(custom_df[custom_df['values'].notna()]['customer'])
    print(nan_list)
    custom_df['neighbors'] = [x for x in custom_df['neighbors'] if x not in nan_list]
    return custom_df


customer = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 1]]
df = pd.DataFrame({'customer': customer, 'values': values, 'neighbors': neighbors})
df = removing_nan_neighboors(df)

print(df)

   customer values neighbors
0        1     NaN    [6, 2]
1        2     NaN    [1, 3]
2        3    10.0    [2, 4]
3        4     NaN    [3, 5]
4        5    11.0    [4, 6]
5        6    12.0    [5, 1]

The objective is to erase the customer numbers from the neighbors, if they have NaN values:

   customer values neighbors
0        1     NaN    [6]
1        2     NaN    [3]
2        3    10.0    []
3        4     NaN    [3, 5]
4        5    11.0    [6]
5        6    12.0    [5]

But I have failed to get that far, for my function doesn't work as intended yet. Help is appreciated.

CodePudding user response:

Try:

df["cust_1"] = np.where(
    np.isnan(np.roll(df["values"], 1)),
    np.nan,
    np.roll(df["customer"], 1),
)

df["cust_2"] = np.where(
    np.isnan(np.roll(df["values"], -1)),
    np.nan,
    np.roll(df["customer"], -1),
)

df["neighbors"] = df[["cust_1", "cust_2"]].agg(
    lambda x: list(x[x.notna()].astype(int)), axis=1
)
df = df.drop(columns=["cust_1", "cust_2"])

print(df)

Prints:

   customer  values neighbors
0         1     NaN       [6]
1         2     NaN       [3]
2         3    10.0        []
3         4     NaN    [3, 5]
4         5    11.0       [6]
5         6    12.0       [5]

CodePudding user response:

If I understood your objective correctly, you want to erase such numbers from every neighbors row that belong to that customer rows, where values is NaN. So basically you want to get the result from your last cell.

I attempted to do that in a list comprehension approach:

df['neighbors_new'] = [[n for n in neighbor 
                        if n not in df[df['values'].isna() == True]['customer'].values] 
                        for neighbor in df.neighbors]
  • Related