Return a list of the values of a column of pandas dataset-CodePudding

I have a Pandas dataset and I want to have a list with the values of a column, without repetitions. For example if you have the following dataset

import pandas as pd
ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})

you should have, for the column col2, the following return [2021, 2022]. I'm using this command:

ds['col2'].value_counts().index.to_list()

It's working and its output is [2021, 2022]. Does a better command exist?

CodePudding user response：

you can use unique() to get the unique values and then cast this to a list.

so this works:

import pandas as pd
ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})

ds['col2'].unique().tolist()

returns this:

[2021, 2022]

CodePudding user response：

Use

set(ds['col2'])

Granted, that is not a list, but since you want unique values, a set might be the better data type. You can always convert it to a list with an extra step: list(set(ds['col2'])).

Note that a set does not preserve the order of the elements: set[2, 1] is equivalent to set[1, 2].

CodePudding user response：

import pandas as pd

ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})

print(list(dict.fromkeys(ds['col2']))) # The fromkeys() method returns a dictionary with the specified keys. The list() function creates a list object.