Home > Blockchain >  From differents columns create a new one with a json format
From differents columns create a new one with a json format

Time:08-18

I'm a stuck with pandas. i have two df like that :

index    | seller |  sales  | is_active
:-----:  |--------|---------|-----------
0        | smith  |   Yes   |   Yes
1        | john   |   No    |   Yes
2        | alan   |   Yes   |   No

and an other one :

  index    | seller | product | EAN         |  URL           | PRICE      |
:-----:    |--------|---------|-------------|----------------|:----------:|
0          | smith  | book    |  ANUDH17e89 | www.ecvdgv.com |  13.45    
1          | smith  | dvd     |  NVGS5w621  | www.awfcj.com  |  23.76
2          | smith  | cd      |  NCYbh658   | www.bstx.com   |  9.99
3          | john   | sofa    |  codkv32876 | www.....       |  348
4          | john   | umbrella|  chudbic132 | www.....       |  38 
5          | john   | bag     |  coGTTf276  | www.....       |  54
6          | alan   | tv      |  BYU1890H   | www.....       |  239
7          | alan   | cable   |  ndhnjh0988 | www.....       |  5
8          | alan   | fridge  |  BTFS$42561 | www.....       |  158

And i would like to do a left join on the first df and create a column as a json wit the differents informations in a new column as a json. Ssomething like that :

index    | seller |  sales  | is_active | New_column 
:-----:  |--------|---------|-----------|-----------
0        | smith  |   Yes   |   Yes     | {product : book, 
                                               EAN : ANUDH17e89, 
                                               URL : www.ecvdgv.com, 
                                             Price : 13,45} , 
                                          {product :dvd,
                                               EAN : NVGS5w621,
                                               URL : www.awfcj.com, 
                                             Price : 23,76}, 
                                           etc..

and the same for each seller

Hope is clear

Thanks or your help !

CodePudding user response:

Try:

import json

df2["New_column"] = df2.apply(lambda x: json.dumps(x.to_dict()), axis=1)

out = df1.merge(
    df2[["seller", "New_column"]]
    .groupby("seller")
    .agg(", ".join)
    .reset_index(),
    on="seller",
)
print(out)

Prints:

  seller sales is_active                                                                                                                                                                                                                                                                                              New_column
0  smith   Yes       Yes  {"seller": "smith", "product": "book", "EAN": "ANUDH17e89", "URL": "www.ecvdgv.com", "PRICE": 13.45}, {"seller": "smith", "product": "dvd", "EAN": "NVGS5w621", "URL": "www.awfcj.com", "PRICE": 23.76}, {"seller": "smith", "product": "cd", "EAN": "NCYbh658", "URL": "www.bstx.com", "PRICE": 9.99}
1   john    No       Yes             {"seller": "john", "product": "sofa", "EAN": "codkv32876", "URL": "www.....", "PRICE": 348.0}, {"seller": "john", "product": "umbrella", "EAN": "chudbic132", "URL": "www.....", "PRICE": 38.0}, {"seller": "john", "product": "bag", "EAN": "coGTTf276", "URL": "www.....", "PRICE": 54.0}
2   alan   Yes        No                {"seller": "alan", "product": "tv", "EAN": "BYU1890H", "URL": "www.....", "PRICE": 239.0}, {"seller": "alan", "product": "cable", "EAN": "ndhnjh0988", "URL": "www.....", "PRICE": 5.0}, {"seller": "alan", "product": "fridge", "EAN": "BTFS$42561", "URL": "www.....", "PRICE": 158.0}
  • Related