Python pandas grouping a dataframe by the unique value of a column-CodePudding

I have a dataframe in this format

How can i convert this dataframe into groups base on the unique values of Column B

So my results should be in this format

[[[1990-02, 1],[1990-03, 1],[1999-05, 1]],
 [[1992-08, 2],[1996-12, 2],[2020-01, 2]],
 [[1990-05, 3],[1995-08, 3],[1999-11, 3],[2021-12, 3]]
]

CodePudding user response：

This should make the job

import pandas as pd

data = {"A": ["1990-02", "1990-03","1999-05","1992-08","1996-12",
              "2020-01","1990-05","1995-08","1999-11", "2021-12"],
        "B": [1,1,1,2,2,2,3,3,3,3]}

df = pd.DataFrame(data=data)
out = df.groupby("B")['A'].apply(list)

output = [[[date, b_value] for date in block] 
          for b_value, block in zip(out.index, out.values)]

print(output)

CodePudding user response：

Here's one way to get an equivalent structure with arrays:

>>> df.groupby("B").apply(pd.DataFrame.to_numpy).values
[array([['1990-02', 1],
        ['1990-03', 1],
        ['1999-05', 1]], dtype=object)
 array([['1992-08', 2],
        ['1996-12', 2],
        ['2020-01', 2]], dtype=object)
 array([['1990-05', 3],
        ['1995-08', 3],
        ['1999-11', 3],
        ['2021-12', 3]], dtype=object)]

CodePudding user response：

Here is one way to get exactly what you want:

df.assign(l=df.agg(list, axis=1)).groupby('B')['l'].agg(list).tolist()

output:

[[['1990-02', 1], ['1990-03', 1], ['1999-05', 1]],
 [['1992-08', 2], ['1996-12', 2], ['2020-01', 2]],
 [['1990-05', 3], ['1995-08', 3], ['1999-11', 3], ['2021-12', 3]]]