Home > Net >  How create a column with list of jsons if duplicated rows on other column?
How create a column with list of jsons if duplicated rows on other column?

Time:03-05

I have a Pandas dataframe looking like this:

buyer_id    car      color   year
john        ferrari  yellow  2022
eric        ferrari  red     2022
john        mercedes black   1990
victoria    audi     yellow  2017

I would like to create a new column (list of jsons in each row.

Create a column 'identical' with a list in each row:

  • One element in the list if only one buyer is found in 'buyer_id':

    [{'car':..., 'color':..., 'year': ...}]

  • If same buyer on several rows in 'buyer_id'

    [ {'car':'ferrari', 'color': 'yellow ', 'year': 2022}, {'car':'mercedes', 'color': 'black', 'year': 1990} ]

Expected output:

    buyer_id   car      color   year  identical
    john       ferrari  yellow  2022  [{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    eric       ferrari  red     2022  [{'car':'ferrari', 'color': 'red', 'year': 2022}]
    john       mercedes black   1990  [[{'car':'ferrari', 'color': 'yellow ', 'year': 2022},{'car':'mercedes', 'color': 'black', 'year': 1990}]
    victoria   audi     yellow  2017  [{'car':'audi', 'color': 'yellow', 'year': 2017}]

I don't know how to do this with Pandas and if it is possible.

CodePudding user response:

You could use GroupBy.apply and to_json with the orient="records" parameter:

s = (df.groupby('buyer_id')
       .apply(lambda g: g.drop('buyer_id', axis=1)
                         .to_json(orient='records'))
    )
df2 = df.merge(s.rename('identical'), left_on='buyer_id', right_index=True)

or in place:

s = (df.set_index('buyer_id')
       .groupby(level='buyer_id')
       .apply(lambda g: g.to_json(orient='records'))
    )
df['identical'] = df['buyer_id'].map(s)

output:

   buyer_id       car   color  year                                                                                        identical
0      john   ferrari  yellow  2022  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
1      eric   ferrari     red  2022                                                    [{"car":"ferrari","color":"red","year":2022}]
2      john  mercedes   black  1990  [{"car":"ferrari","color":"yellow","year":2022},{"car":"mercedes","color":"black","year":1990}]
3  victoria      audi  yellow  2017                                                    [{"car":"audi","color":"yellow","year":2017}]

CodePudding user response:

Try:

to_dict = lambda x: x.to_dict('records')
df['identical'] = df['buyer_id'].map(df.set_index('buyer_id') \
                                       .groupby('buyer_id').apply(to_dict))
print(df)

# Output
   buyer_id       car   color  year                                                                                                   identical
0      john   ferrari  yellow  2022  [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
1      eric   ferrari     red  2022                                                          [{'car': 'ferrari', 'color': 'red', 'year': 2022}]
2      john  mercedes   black  1990  [{'car': 'ferrari', 'color': 'yellow', 'year': 2022}, {'car': 'mercedes', 'color': 'black', 'year': 1990}]
3  victoria      audi  yellow  2017                                                          [{'car': 'audi', 'color': 'yellow', 'year': 2017}]

To export your column as JSON, you can use:

>>> df['identical'].to_json(orient='records', indent=2)
[
  [
    {
      "car":"ferrari",
      "color":"yellow",
      "year":2022
    },
    {
      "car":"mercedes",
      "color":"black",
      "year":1990
    }
  ],
  [
    {
      "car":"ferrari",
      "color":"red",
      "year":2022
    }
  ],
  [
    {
      "car":"ferrari",
      "color":"yellow",
      "year":2022
    },
    {
      "car":"mercedes",
      "color":"black",
      "year":1990
    }
  ],
  [
    {
      "car":"audi",
      "color":"yellow",
      "year":2017
    }
  ]
]
  • Related