I have a df of orders where for each product there's an indicator whether it was bought or not (0/1). I want to add a new column called total_by column that will concatenate only the bought products per each order. I tried different apply techniques, but didn't succeed to get it right. Basically, I need to strip the "was_" and "_bought?" before adding to total buy and it should be done by condition of value=1 per each product. In the example below, I have 3 products, in reality there are many more, so I can't just write it explicitly
order_id was_pencil_bought? was_notebook_bought? was_bag_bought total_buy
1 1 0 1 'pencil','bag'
2 0 0 1 'bag'
3 1 1 1 'pencil','notebook','bag'
CodePudding user response:
This should work:
selected_cols = ["was_pencil_bought", "was_notebook_bought", "was_bag_bought"]
def func(data):
l = []
for col in selected_cols:
if data[col] == 1:
l.append(col)
return l
df["total_buy"] = df.apply(func, axis=1)
CodePudding user response:
This should work:
import pandas as pd
df = pd.DataFrame([[1,1,0,1],[2,0,0,1],[3,1,1,1]], columns = ["order_id","was_pencil_bought?", "was_notebook_bought?", "was_bag_bought?"])
df['total_buy'] = ""
for index, row in df.iterrows():
for product in ['pencil', 'notebook', 'bag']:
if row[f'was_{product}_bought?']:
df.loc[index,'total_buy'] = product ","
# remove the ','
df.loc[index,'total_buy'] = df.loc[index,'total_buy'][:-1]
print(df)
CodePudding user response:
This might help...
from io import StringIO
import time
import pandas as pd
import numpy as np
df = pd.DataFrame([(1, 1, 0, 1), (2, 0, 0, 1), (3, 1, 1, 1)], columns=['order_id', 'was_pencil_bought?', 'was_notebook_bought?', 'was_bag_bought?'])
print(df)
# Create a products list by trimming out "was_" and "bought?"
products = np.array([
p[len("was_"):-len("_bought?")] # using @constantstranger's trick
for p in df.columns if p != "order_id"
])
print("products:", products)
df['total_buy'] = (
df.loc[:, df.columns != "order_id"]
# select the product names from products list where row value is 1
.apply(lambda row: ", ".join(products[row == 1]), axis=1)
)
print(df)
Output
products: ['pencil' 'notebook' 'bag']
order_id was_pencil_bought? ... was_bag_bought? total_buy
0 1 1 ... 1 pencil, bag
1 2 0 ... 1 bag
2 3 1 ... 1 pencil, notebook, bag
CodePudding user response:
Here's an answer to your question:
import pandas as pd
def addColToRecords(df):
rows = df.to_dict('records')
for d in rows:
d['total_buy'] = [k[len('was_'):-len('_bought')] for k in d if k != 'order_id' and d[k] > 0]
df = pd.DataFrame(rows)
return df
records = [
{'order_id': 1, 'was_pencil_bought': 1, 'was_notebook_bought': 0, 'was_bag_bought': 1},
{'order_id': 2, 'was_pencil_bought': 0, 'was_notebook_bought': 0, 'was_bag_bought': 1},
{'order_id': 3, 'was_pencil_bought': 1, 'was_notebook_bought': 1, 'was_bag_bought': 1}
]
df = pd.DataFrame.from_records(records)
df2 = addColToRecords(df)
print(df2)
Output:
order_id was_pencil_bought was_notebook_bought was_bag_bought total_buy
0 1 1 0 1 [pencil, bag]
1 2 0 0 1 [bag]
2 3 1 1 1 [pencil, notebook, bag]