After importing a CSV file, I get this input:
>>> df.head()
id customer products
0 0 0 1 0 1 0
1 1 22 0 5 0 4 5 3 2 1 1
2 2 57 5 2 4
...
I would like to change it so that it could match the SQL database schema:
order_id, customer_id, product_id, product_qty, primary key(order_id, customer_id, product_id))
Wondering if using pandas
and to_sql
(testing in SQLite) this could be achieved, the desired output would be:
id customer product_id product_qty
0 0 0 2
0 0 1 2
1 22 0 2
1 22 1 2
1 22 2 1
1 22 3 1
1 22 4 1
1 22 5 2
2 57 2 1
2 57 4 1
2 57 5 1
CodePudding user response:
In [72]: df = pd.DataFrame({'id': [0,1,2], 'customer': [0,22,57], 'products': ['1 0 1 0', '0 5 0 4 5 3 2 1 1', '5 2 4']})
In [73]: df.assign(product_id=df['products'].str.split()).explode('product_id').drop('products', axis='columns').groupby(['id', 'customer', 'pr
...: oduct_id']).size().reset_index(name='product_qty')
Out[73]:
id customer product_id product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 2 1
5 1 22 3 1
6 1 22 4 1
7 1 22 5 2
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1
CodePudding user response:
- Series.str.split to convert strings into list
- DataFrame.explode to unnest list variables to rows
- DataFrame.groupby to count the number of occurrences.
See the example in the below:
import re
from io import StringIO
import pandas as pd
data = """
id customer products
0 0 1 0 1 0
1 22 0 5 0 4 5 3 2 1 1
2 57 5 2 4
""".strip()
data = re.sub(r"\s{2,}", ",", data)
df = pd.read_csv(StringIO(data))
df["product_id"] = df.products.str.split()
df = df.explode("product_id", ignore_index=False)
result = df.groupby(["id", "customer", "product_id"], as_index=False).agg(
product_qty=("product_id", "size"))
result
id customer product_id product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 2 1
5 1 22 3 1
6 1 22 4 1
7 1 22 5 2
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1
CodePudding user response:
Check with
out = df.assign(products=df.products.str.split()).explode('products').value_counts(list(df)).reset_index(name='product_qty')
id customer products product_qty
0 0 0 0 2
1 0 0 1 2
2 1 22 0 2
3 1 22 1 2
4 1 22 5 2
5 1 22 2 1
6 1 22 3 1
7 1 22 4 1
8 2 57 2 1
9 2 57 4 1
10 2 57 5 1