Home > Mobile >  pandas how to create a one to many relation (add quantity column)
pandas how to create a one to many relation (add quantity column)

Time:10-31

After importing a CSV file, I get this input:

>>> df.head()
   id  customer           products
0   0         0            1 0 1 0
1   1        22  0 5 0 4 5 3 2 1 1
2   2        57              5 2 4
...

I would like to change it so that it could match the SQL database schema:

order_id, customer_id, product_id, product_qty, primary key(order_id, customer_id, product_id))

Wondering if using pandas and to_sql (testing in SQLite) this could be achieved, the desired output would be:

id customer product_id product_qty
0   0       0          2     
0   0       1          2
1   22      0          2
1   22      1          2
1   22      2          1 
1   22      3          1
1   22      4          1
1   22      5          2
2   57      2          1
2   57      4          1
2   57      5          1

CodePudding user response:

In [72]: df = pd.DataFrame({'id': [0,1,2], 'customer': [0,22,57], 'products': ['1 0 1 0', '0 5 0 4 5 3 2 1 1', '5 2 4']})

In [73]: df.assign(product_id=df['products'].str.split()).explode('product_id').drop('products', axis='columns').groupby(['id', 'customer', 'pr
    ...: oduct_id']).size().reset_index(name='product_qty')
Out[73]:
    id  customer product_id  product_qty
0    0         0          0            2
1    0         0          1            2
2    1        22          0            2
3    1        22          1            2
4    1        22          2            1
5    1        22          3            1
6    1        22          4            1
7    1        22          5            2
8    2        57          2            1
9    2        57          4            1
10   2        57          5            1

CodePudding user response:

See the example in the below:

import re
from io import StringIO
import pandas as pd

data = """
id  customer           products
0         0            1 0 1 0
1        22  0 5 0 4 5 3 2 1 1
2        57              5 2 4
""".strip()
data = re.sub(r"\s{2,}", ",", data)
df = pd.read_csv(StringIO(data))

df["product_id"] = df.products.str.split()
df = df.explode("product_id", ignore_index=False)
result = df.groupby(["id", "customer", "product_id"], as_index=False).agg(
  product_qty=("product_id", "size"))
result
    id  customer product_id  product_qty
0    0         0          0            2
1    0         0          1            2
2    1        22          0            2
3    1        22          1            2
4    1        22          2            1
5    1        22          3            1
6    1        22          4            1
7    1        22          5            2
8    2        57          2            1
9    2        57          4            1
10   2        57          5            1

CodePudding user response:

Check with

out = df.assign(products=df.products.str.split()).explode('products').value_counts(list(df)).reset_index(name='product_qty')
    id  customer products  product_qty
0    0         0        0            2
1    0         0        1            2
2    1        22        0            2
3    1        22        1            2
4    1        22        5            2
5    1        22        2            1
6    1        22        3            1
7    1        22        4            1
8    2        57        2            1
9    2        57        4            1
10   2        57        5            1
  • Related