I uploaded the csv file
#Open the first dataset
train=pd.read_csv("order_products__train.csv",index_col="order_id")
The data looks like:
product_id
order_id
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
2 5
2 6
What I want is the data frame looks like,
order_id product_id
1 1,2,3,4
2 1,2,3,4,5,6
Since I want to generate a list like
[[1,2,3,4],[1,2,3,4,5,6]]
Could anyone help?
CodePudding user response:
You can use the the function .groupby()
to do that
train = train.groupby(['order_id'])['product_id'].apply(list)
That would give you expected output :
order_id
1 [1, 2, 3, 4]
2 [1, 2, 3, 4, 5]
Finally, you can cast this to a DataFrame or directly to a list to get what you want :
train = train.to_frame() # To pd.DataFrame
# Or
train = train.to_list() # To nested lists [[1,2,3,4],[1,2,3,4,5]]
CodePudding user response:
There must be better ways but I guess you can simply do the following:
list_product = []
for i in train["order_id"].unique():
tmp = train[train["order_id"] == i]
list_product.append(tmp["product_id"].to_list())