Home > other >  Splitting data frame into smaller data frames based on unique column values
Splitting data frame into smaller data frames based on unique column values

Time:12-01

this is my data frame:

    Quantity     Code         Value       
0       1757     08951201     717.0
1       1100     08A85800       0.0
2       2500     08A85800       0.0
3        323     08951201       0.0
4        800     08A85800       0.0

and i what to split this into smaller data frames created based on Code column. (E.g. this one should split into df1 with all 08951201 codes and df2 with 08A85800)

Edit: And I'd love to have a way to merge them back into original dataframe in the same order after some value calculations im gonna perform.

CodePudding user response:

Use groupby and apply your custom function to process your sub dataframe:

groups = df.groupby('Code')
print(list(groups))

# Output:
[('08951201',    Quantity      Code  Value
0      1757  08951201  717.0
3       323  08951201    0.0),

('08A85800',    Quantity      Code  Value
1      1100  08A85800    0.0
2      2500  08A85800    0.0
4       800  08A85800    0.0)]

Now suppose you want to sum by Value:

>>> df.groupby('Code')['Value'].sum()
Code
08951201    717.0
08A85800      0.0
Name: Value, dtype: float64

CodePudding user response:

As suggested you could use groupby() on your dataframe to segregate by one column name values:

import pandas as pd

cols = ['Quantity', 'Code', 'Value']
data = [[1757,     '08951201',     717.0],
 [1100,     '08A85800',       0.0],
 [2500,     '08A85800',       0.0],
 [323,    '08951201',      0.0],
 [800,    '08A85800',       0.0]]

df = pd.DataFrame(data, columns=cols)

groups =df.groupby(['Code'])

Then you can recover indices by groups.indices , this will return a dict with 'Code' values as keys, and index as values. For last if you want to get every sub-dataframe you can call group_list = list(groups). I suggest to do the work in 2 steps (first group by, then call list), because this way you can call other methods over the groupDataframe (group)


EDIT

Then if you want a particular dataframe you could call

 df_i = group_list[i][1]

group_list[i] is the i-th element of sub-dataframe, but it's a tupple containing (group_val,group_df). where group_val is the value associated to this new dataframe ('08951201' or '08A85800') and group_df is the new dataframe.

  • Related