I have a dataframe in this format
A B
1990-02 1
1990-03 1
1999-05 1
1992-08 2
1996-12 2
2020-01 2
1990-05 3
1995-08 3
1999-11 3
2021-12 3
How can i convert this dataframe into groups base on the unique values of Column B
So my results should be in this format
[[[1990-02, 1],[1990-03, 1],[1999-05, 1]],
[[1992-08, 2],[1996-12, 2],[2020-01, 2]],
[[1990-05, 3],[1995-08, 3],[1999-11, 3],[2021-12, 3]]
]
CodePudding user response:
This should make the job
import pandas as pd
data = {"A": ["1990-02", "1990-03","1999-05","1992-08","1996-12",
"2020-01","1990-05","1995-08","1999-11", "2021-12"],
"B": [1,1,1,2,2,2,3,3,3,3]}
df = pd.DataFrame(data=data)
out = df.groupby("B")['A'].apply(list)
output = [[[date, b_value] for date in block]
for b_value, block in zip(out.index, out.values)]
print(output)
CodePudding user response:
Here's one way to get an equivalent structure with arrays:
>>> df.groupby("B").apply(pd.DataFrame.to_numpy).values
[array([['1990-02', 1],
['1990-03', 1],
['1999-05', 1]], dtype=object)
array([['1992-08', 2],
['1996-12', 2],
['2020-01', 2]], dtype=object)
array([['1990-05', 3],
['1995-08', 3],
['1999-11', 3],
['2021-12', 3]], dtype=object)]
CodePudding user response:
Here is one way to get exactly what you want:
df.assign(l=df.agg(list, axis=1)).groupby('B')['l'].agg(list).tolist()
output:
[[['1990-02', 1], ['1990-03', 1], ['1999-05', 1]],
[['1992-08', 2], ['1996-12', 2], ['2020-01', 2]],
[['1990-05', 3], ['1995-08', 3], ['1999-11', 3], ['2021-12', 3]]]