Home > Enterprise >  Python: Error when appending new item to a list in a pandas dataframe
Python: Error when appending new item to a list in a pandas dataframe

Time:09-23

I have a pandas dataframe with three columns: user_id (str), list_of_purchases (list) and a binary column named b.

I would like to create a fourth column named final_list that follows the rules below:

  • When b = 1, then final_list should be the concatenation of list_of_purchases and the item "Success". So for example, if list_of_purchases = ['item_1', 'item_2', 'item_3'] then final_list should be ['item_1', 'item_2', 'item_3','Success']
  • When b = 0, then instead of "success", final_list should be ['item_1', 'item_2', 'item_3','Null']

I tried the following code but got the error:

df['final_list'] = np.where(
    df['b'] == 0,
    df['list_of_purchases']   ['Null'],
    df['list_of_purchases']   ['Success'])

TypeError: Cannot broadcast np.ndarray with operand of type <class 'list'>

I figured out how to do it using a for loop and checking every row in column b, but it is really ineficient and takes a long time.

Thanks in advance for the help!

CodePudding user response:

#create a function:
def lista(df):
    return [df['list_of_purchases']   ['Null'] if df['b'] == 0 else df['list_of_purchases']   ['Success']]

#use the function on every row of df:
df['final_list'] = df.apply(lista, axis=1)

from what I understand, pandas dataframes are not designed to store lists as their values, so super efficient solutions are not available

CodePudding user response:

Feels like a nice use for a lambda, rather than defining a function, though both approaches work.

import pandas as pd
import numpy as np

data = [[1, [1,2,3]],
        [0, [4,5,6]]]
df = pd.DataFrame(data, columns=["b", "list_of_purchases"])

df["Output"] = df.apply( \
    lambda row : row["list_of_purchases"]   \
    ["Success" if row["b"] else "Null"], axis=1)

print(df)

Produces:

   b list_of_purchases              Output
0  1         [1, 2, 3]  [1, 2, 3, Success]
1  0         [4, 5, 6]     [4, 5, 6, Null]

"Benefits" of using a lambda are (here) basically just that it avoids defining a function which you might not ever reuse.

If the function/logic is to be reused elsewhere then defining a function (and not using a lambda) might be exactly the right approach. If it's only used for this, and not reused, I'd probably go with the lambda.

  • Related