I have a Pandas DataFrame with a list column as follows.
name | list-of-cars |
---|---|
Sasha | [Citroen, Peugeot, Audi, Renault] |
Eliott | [Peugeot, Mercedes, Renault] |
with Citroen, Peugeot, etc which are Objects of my class Car:
class Car:
def __init__(self, name, speed, year):
self.__name = name
self.__speed = speed
self.__year = year
How would I be able to get one attribute of the different objects of the list? For example, get the year of all the elements of the list-of-cars for one name?
CodePudding user response:
Make your attribute "public" in your class by using self.year
instead of self.__year
. Or add a property.
Then you need to use a loop/list comprehension:
df['years'] = [[c.year for c in l] for l in df['list-of-cars']]
output:
name list-of-cars years
0 Sasha [Citroen, Peugeot, Audi, Renault] [2000, 2001, 2002, 2003]
1 Eliott [Peugeot, Mercedes, Renault] [2004, 2005, 2006]
class with property (and repr for nice display in the DataFrame):
class Car:
def __init__(self, name, speed, year):
self.__name = name
self.__speed = speed
self.__year = year
@property
def year(self):
return self.__year
def __repr__(self):
return f'{self.__name}'
CodePudding user response:
I'm not sure if it's a good idea to store a list of objects as values of a DataFrame.
Anyway, you can extract an attribute by looping with iterrows() and appending the attributes of each line to a list of lists. Then adding it to the df as a column.
For example, let's say you need the years:
import pandas as pd
class Car:
def __init__(self, name, speed, year):
self.name = name
self.speed = speed
self.year = year
Citroen=Car("Citroen", "123", 2010)
Peugeot=Car("Peugeot", "123", 2021)
Audi=Car("Audi", "123", 2017)
Renault=Car("Renault", "123", 2005)
df=pd.DataFrame({
"Name": ["Sasha", "Eliott"],
"List_of_cars": [[Citroen, Peugeot], [Audi, Renault]]
})
years_column=[]
for index, row in df.iterrows():
inner_list=[]
for car in row["List_of_cars"]:
inner_list.append(car.year)
years_column.append(inner_list)
df["years"]=years_column
The df would be like that:
Name List_of_cars years
0 Sasha [<__main__.Car object at 0x0000021D7EC573D0>, ... [2010, 2021]
1 Eliott [<__main__.Car object at 0x0000021D6DFAED30>, ... [2017, 2005]
Then you can query the table as usual.
The question is, however, why not just store the information you need in a tabular form (aka fields of the DataFrame) instead of custom objects?