Splitting data frame into smaller data frames based on unique column values-CodePudding

this is my data frame:

    Quantity     Code         Value       
0       1757     08951201     717.0
1       1100     08A85800       0.0
2       2500     08A85800       0.0
3        323     08951201       0.0
4        800     08A85800       0.0

and i what to split this into smaller data frames created based on Code column. (E.g. this one should split into df1 with all 08951201 codes and df2 with 08A85800)

Edit: And I'd love to have a way to merge them back into original dataframe in the same order after some value calculations im gonna perform.

CodePudding user response：

Use groupby and apply your custom function to process your sub dataframe:

groups = df.groupby('Code')
print(list(groups))

# Output:
[('08951201',    Quantity      Code  Value
0      1757  08951201  717.0
3       323  08951201    0.0),

('08A85800',    Quantity      Code  Value
1      1100  08A85800    0.0
2      2500  08A85800    0.0
4       800  08A85800    0.0)]

Now suppose you want to sum by Value:

>>> df.groupby('Code')['Value'].sum()
Code
08951201    717.0
08A85800      0.0
Name: Value, dtype: float64

CodePudding user response：

As suggested you could use groupby() on your dataframe to segregate by one column name values:

import pandas as pd

cols = ['Quantity', 'Code', 'Value']
data = [[1757,     '08951201',     717.0],
 [1100,     '08A85800',       0.0],
 [2500,     '08A85800',       0.0],
 [323,    '08951201',      0.0],
 [800,    '08A85800',       0.0]]

df = pd.DataFrame(data, columns=cols)

groups =df.groupby(['Code'])

Then you can recover indices by groups.indices , this will return a dict with 'Code' values as keys, and index as values. For last if you want to get every sub-dataframe you can call group_list = list(groups). I suggest to do the work in 2 steps (first group by, then call list), because this way you can call other methods over the groupDataframe (group)

EDIT

Then if you want a particular dataframe you could call

 df_i = group_list[i][1]

group_list[i] is the i-th element of sub-dataframe, but it's a tupple containing (group_val,group_df). where group_val is the value associated to this new dataframe ('08951201' or '08A85800') and group_df is the new dataframe.