Pandas/Python: Store values of columns into list based on value in another column-CodePudding

I have the following problem:

I want to store the values of the four different columns (Age_1 - Age_4) within a dataframe into a list, which is depending on the first column 'Year'.

Year	Age_1	Age_2	Age_3	Age_4
2000	18	20	25	56
2000	17	32	24	41
2001	20	26	24	39

...

So basically I want a list that then just contains all the ages that there is in the dataset for every year e.g. The first list would be list_2000 = [18,20,25,56,17,32,24,41...], the second would then be list_2001 = [20,26,24,39...]

Actually I assume that this should be easy to do, but my attempts weren't successful yet. So any help is apprechiated

CodePudding user response：

IIUC, use the underlying numpy array and groupby, then flatten the data with ravel and transform to list with tolist:

dic = (
 df.set_index('Year').groupby(level='Year')
   .apply(lambda d: d.to_numpy().ravel().tolist())
   .to_dict()
)

output:

{2000: [18, 20, 25, 56, 17, 32, 24, 41], 2001: [20, 26, 24, 39]}

CodePudding user response：

IIUC,

df.melt('Year',
        value_vars=['Age_1', 'Age_2', 'Age_3', 'Age_4'])\
.groupby('Year')['value'].agg(list).to_dict()