Home > other >  Get a list element from a Pandas DataFrame column
Get a list element from a Pandas DataFrame column

Time:09-21

I have a Pandas DataFrame with a list column as follows.

name list-of-cars
Sasha [Citroen, Peugeot, Audi, Renault]
Eliott [Peugeot, Mercedes, Renault]

with Citroen, Peugeot, etc which are Objects of my class Car:

class Car:

def __init__(self, name, speed, year):
    self.__name = name
    self.__speed = speed
    self.__year  = year

How would I be able to get one attribute of the different objects of the list? For example, get the year of all the elements of the list-of-cars for one name?

CodePudding user response:

Make your attribute "public" in your class by using self.year instead of self.__year. Or add a property.

Then you need to use a loop/list comprehension:

df['years'] = [[c.year for c in l] for l in df['list-of-cars']]

output:

     name                       list-of-cars                     years
0   Sasha  [Citroen, Peugeot, Audi, Renault]  [2000, 2001, 2002, 2003]
1  Eliott       [Peugeot, Mercedes, Renault]        [2004, 2005, 2006]
class with property (and repr for nice display in the DataFrame):
class Car:
    def __init__(self, name, speed, year):
        self.__name = name
        self.__speed = speed
        self.__year  = year

    @property
    def year(self):  
        return self.__year
        
    def __repr__(self):
        return f'{self.__name}'

CodePudding user response:

I'm not sure if it's a good idea to store a list of objects as values of a DataFrame.

Anyway, you can extract an attribute by looping with iterrows() and appending the attributes of each line to a list of lists. Then adding it to the df as a column.

For example, let's say you need the years:

import pandas as pd

class Car:
    def __init__(self, name, speed, year):
        self.name = name
        self.speed = speed
        self.year  = year

Citroen=Car("Citroen", "123", 2010)
Peugeot=Car("Peugeot", "123", 2021)
Audi=Car("Audi", "123", 2017)
Renault=Car("Renault", "123", 2005)

df=pd.DataFrame({
    "Name": ["Sasha", "Eliott"],
    "List_of_cars": [[Citroen, Peugeot], [Audi, Renault]]
})

years_column=[]
for index, row in df.iterrows():
    inner_list=[]
    for car in row["List_of_cars"]:
        inner_list.append(car.year)
    years_column.append(inner_list)

df["years"]=years_column

The df would be like that:

     Name                                       List_of_cars         years
0   Sasha  [<__main__.Car object at 0x0000021D7EC573D0>, ...  [2010, 2021]
1  Eliott  [<__main__.Car object at 0x0000021D6DFAED30>, ...  [2017, 2005]

Then you can query the table as usual.

The question is, however, why not just store the information you need in a tabular form (aka fields of the DataFrame) instead of custom objects?

  • Related