I have the following problem:
I want to store the values of the four different columns (Age_1 - Age_4) within a dataframe into a list, which is depending on the first column 'Year'.
Year | Age_1 | Age_2 | Age_3 | Age_4 |
---|---|---|---|---|
2000 | 18 | 20 | 25 | 56 |
2000 | 17 | 32 | 24 | 41 |
2001 | 20 | 26 | 24 | 39 |
...
So basically I want a list that then just contains all the ages that there is in the dataset for every year e.g. The first list would be list_2000 = [18,20,25,56,17,32,24,41...], the second would then be list_2001 = [20,26,24,39...]
Actually I assume that this should be easy to do, but my attempts weren't successful yet. So any help is apprechiated
CodePudding user response:
IIUC, use the underlying numpy array and groupby
, then flatten the data with ravel
and transform to list with tolist
:
dic = (
df.set_index('Year').groupby(level='Year')
.apply(lambda d: d.to_numpy().ravel().tolist())
.to_dict()
)
output:
{2000: [18, 20, 25, 56, 17, 32, 24, 41], 2001: [20, 26, 24, 39]}
CodePudding user response:
IIUC,
df.melt('Year',
value_vars=['Age_1', 'Age_2', 'Age_3', 'Age_4'])\
.groupby('Year')['value'].agg(list).to_dict()