Home > front end >  Convert a column in DataFrame to list and another column to set
Convert a column in DataFrame to list and another column to set


I would like to convert some columns in a dataframe to list and some to set which has similar name The given dataframe, df:

    Name    id     other           list
0   ben  00005     abc      [1000, A, 90]
1  alex  00006     gf       [3000, B, 80]
2  linn  00007     jgj      [600, C, 55]
3  luke  00009     gg       [5000, D, 88]
4  alex  00001     gf      [7000, R, 98]
5  ben  00002      abc      [9000, S, 28]
6  ben   00003     abc      [5000, T, 48]

The desired output, df1:

   Name    id                   other           list
0   ben  {00005, 0002,0003}     abc      [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1  alex  {00006,0001}           gf       [3000, B, 80], [7000, R, 98]
2  linn  {00007}                jgj      [600, C, 55]
3  luke  {00009}                gg       [5000, D, 88]

CodePudding user response:

You can use .groupby() with .agg():

df.groupby("Name", as_index=False).agg({"id": set, "other": "first", "list": list})

This outputs:

   Name                     id other                                           list
0  alex         {00001, 00006}    gf                 [[3000, B, 80], [7000, R, 98]]
1   ben  {00005, 00003, 00002}   abc  [[1000, A, 90], [9000, S, 28], [5000, T, 48]]
2  linn                {00007}   jgj                                 [[600, C, 55]]
3  luke                {00009}    gg                                [[5000, D, 88]]

CodePudding user response:

You can use the groupby() method to group the dataframe by the 'Name' column and then use the agg() function to convert the 'id' column to a set and the 'list' column to a list of lists. Here's an example of how you can achieve this:

import pandas as pd

# Example dataframe
df = pd.DataFrame({'Name': ['ben', 'alex', 'linn', 'luke', 'alex', 'ben', 'ben'],
                  'id': [5, 6, 7, 9, 1, 2, 3],
                  'other': ['abc', 'gf', 'jgj', 'gg', 'gf', 'abc', 'abc'],
                  'list': [[1000, 'A', 90], [3000, 'B', 80], [600, 'C', 55], [5000, 'D', 88], [7000, 'R', 98], [9000, 'S', 28], [5000, 'T', 48]]})

# Group dataframe by 'Name' column
grouped_df = df.groupby('Name',as_index=False)

# Use agg() function to convert 'id' column to set and 'list' column to list of lists
df1 = grouped_df.agg({'id': set, 'other': 'first', 'list': list})


This will give you the desired output:

Name        id     other        list
alex      {1, 6}    gf         [[3000, 'B', 80], [7000, 'R', 98]]
ben       {2, 3, 5} abc        [[1000, 'A', 90], [9000, 'S', 28], [5000, 'T', 48]]
luke      {9}       gg         [[5000, 'D', 88]]
linn      {7}       jgj        [[600, 'C', 55]]
  • Related