Create a column containing columns' names-CodePudding

I have a df of orders where for each product there's an indicator whether it was bought or not (0/1). I want to add a new column called total_by column that will concatenate only the bought products per each order. I tried different apply techniques, but didn't succeed to get it right. Basically, I need to strip the "was_" and "_bought?" before adding to total buy and it should be done by condition of value=1 per each product. In the example below, I have 3 products, in reality there are many more, so I can't just write it explicitly

order_id  was_pencil_bought?  was_notebook_bought?  was_bag_bought   total_buy
     1            1                    0                   1        'pencil','bag'
     2            0                    0                   1        'bag'
     3            1                    1                   1        'pencil','notebook','bag'

CodePudding user response：

This should work:

selected_cols = ["was_pencil_bought", "was_notebook_bought", "was_bag_bought"]
def func(data):
    l = []
    for col in selected_cols:
        if data[col] == 1:
            l.append(col)
    return l
df["total_buy"] = df.apply(func, axis=1)

CodePudding user response：

This should work:

import pandas as pd

df = pd.DataFrame([[1,1,0,1],[2,0,0,1],[3,1,1,1]], columns = ["order_id","was_pencil_bought?", "was_notebook_bought?", "was_bag_bought?"])


df['total_buy'] = ""

for index, row in df.iterrows():
    for product in ['pencil', 'notebook', 'bag']:
        if row[f'was_{product}_bought?']:
            df.loc[index,'total_buy']  = product   ","

    # remove the ','
    df.loc[index,'total_buy'] = df.loc[index,'total_buy'][:-1]
    
print(df)

CodePudding user response：

This might help...

from io import StringIO
import time
import pandas as pd
import numpy as np

df = pd.DataFrame([(1, 1, 0, 1), (2, 0, 0, 1), (3, 1, 1, 1)], columns=['order_id', 'was_pencil_bought?', 'was_notebook_bought?', 'was_bag_bought?'])
print(df)

# Create a products list by trimming out "was_" and "bought?"
products = np.array([
        p[len("was_"):-len("_bought?")]  # using @constantstranger's trick
        for p in df.columns if p != "order_id"
])
print("products:", products)

df['total_buy'] = (
    df.loc[:, df.columns != "order_id"]
    # select the product names from products list where row value is 1
    .apply(lambda row: ", ".join(products[row == 1]), axis=1)
)
print(df)

Output

products: ['pencil' 'notebook' 'bag']

   order_id  was_pencil_bought?  ...  was_bag_bought?              total_buy
0         1                   1  ...                1            pencil, bag
1         2                   0  ...                1                    bag
2         3                   1  ...                1  pencil, notebook, bag

CodePudding user response：

Here's an answer to your question:

import pandas as pd
def addColToRecords(df):
    rows = df.to_dict('records')
    for d in rows:
        d['total_buy'] = [k[len('was_'):-len('_bought')] for k in d if k != 'order_id' and d[k] > 0]
    df = pd.DataFrame(rows)
    return df
    
records = [
    {'order_id': 1, 'was_pencil_bought': 1, 'was_notebook_bought': 0, 'was_bag_bought': 1},
    {'order_id': 2, 'was_pencil_bought': 0, 'was_notebook_bought': 0, 'was_bag_bought': 1},
    {'order_id': 3, 'was_pencil_bought': 1, 'was_notebook_bought': 1, 'was_bag_bought': 1}
]
df = pd.DataFrame.from_records(records)
df2 = addColToRecords(df)
print(df2)

Output:

   order_id  was_pencil_bought  was_notebook_bought  was_bag_bought                total_buy
0         1                  1                    0               1            [pencil, bag]
1         2                  0                    0               1                    [bag]
2         3                  1                    1               1  [pencil, notebook, bag]