I have a Pandas dataset and I want to have a list with the values of a column, without repetitions. For example if you have the following dataset
import pandas as pd
ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})
you should have, for the column col2
, the following return [2021, 2022]
. I'm using this command:
ds['col2'].value_counts().index.to_list()
It's working and its output is [2021, 2022]
. Does a better command exist?
CodePudding user response:
you can use unique()
to get the unique values and then cast this to a list.
so this works:
import pandas as pd
ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})
ds['col2'].unique().tolist()
returns this:
[2021, 2022]
CodePudding user response:
Use
set(ds['col2'])
Granted, that is not a list, but since you want unique values, a set
might be the better data type. You can always convert it to a list with an extra step: list(set(ds['col2']))
.
Note that a set does not preserve the order of the elements: set[2, 1]
is equivalent to set[1, 2]
.
CodePudding user response:
import pandas as pd
ds = pd.DataFrame({'col1':['a', 'b', 'c', 'd'], 'col2':[2021, 2022, 2022, 2021]})
print(list(dict.fromkeys(ds['col2']))) # The fromkeys() method returns a dictionary with the specified keys. The list() function creates a list object.