this is my data frame:
Quantity Code Value
0 1757 08951201 717.0
1 1100 08A85800 0.0
2 2500 08A85800 0.0
3 323 08951201 0.0
4 800 08A85800 0.0
and i what to split this into smaller data frames created based on Code column. (E.g. this one should split into df1 with all 08951201 codes and df2 with 08A85800)
Edit: And I'd love to have a way to merge them back into original dataframe in the same order after some value calculations im gonna perform.
CodePudding user response:
Use groupby
and apply your custom function to process your sub dataframe:
groups = df.groupby('Code')
print(list(groups))
# Output:
[('08951201', Quantity Code Value
0 1757 08951201 717.0
3 323 08951201 0.0),
('08A85800', Quantity Code Value
1 1100 08A85800 0.0
2 2500 08A85800 0.0
4 800 08A85800 0.0)]
Now suppose you want to sum
by Value
:
>>> df.groupby('Code')['Value'].sum()
Code
08951201 717.0
08A85800 0.0
Name: Value, dtype: float64
CodePudding user response:
As suggested you could use groupby()
on your dataframe to segregate by one column name values:
import pandas as pd
cols = ['Quantity', 'Code', 'Value']
data = [[1757, '08951201', 717.0],
[1100, '08A85800', 0.0],
[2500, '08A85800', 0.0],
[323, '08951201', 0.0],
[800, '08A85800', 0.0]]
df = pd.DataFrame(data, columns=cols)
groups =df.groupby(['Code'])
Then you can recover indices by groups.indices
, this will return a dict with 'Code' values as keys, and index as values. For last if you want to get every sub-dataframe you can call group_list = list(groups)
. I suggest to do the work in 2 steps (first group by, then call list), because this way you can call other methods over the groupDataframe (group
)
EDIT
Then if you want a particular dataframe you could call
df_i = group_list[i][1]
group_list[i]
is the i-th element of sub-dataframe, but it's a tupple containing (group_val,group_df)
. where group_val
is the value associated to this new dataframe ('08951201'
or '08A85800'
) and group_df
is the new dataframe.