I have a file (CSV) containing data that looks something like this in a table format:
ID | employee | date | value1 | value2 |
---|---|---|---|---|
1 | a | 2022-01-01 | 123 | 456 |
2 | b | 2022-01-01 | 123 | 456 |
3 | a | 2022-01-01 | 123 | 456 |
4 | c | 2022-01-01 | 123 | 456 |
5 | d | 2022-01-01 | 123 | 456 |
6 | b | 2022-01-01 | 123 | 456 |
7 | e | 2022-01-01 | 123 | 456 |
8 | e | 2022-01-01 | 123 | 456 |
In Excel I created a macro with VBA that creates new workbooks for the table filtered by the employee column. The macro works dynamically by creating a dictionary of the unique values in column [employee] and a For Each loop that filters the original table by the key of the dictionary. In this example it would create 5 new workbooks (for employee "a", "b", "c", "d" and "e") displaying only their corresponding data. E.g. for employee a:
ID | employee | date | value1 | value2 |
---|---|---|---|---|
1 | a | 2022-01-01 | 123 | 456 |
3 | a | 2022-01-01 | 123 | 456 |
VBA code:
Sub Filter_Copy()
Dim rng As Range
Dim wb_new As Workbook
Dim dic As Object
Dim cell AsSub Range
Dim key As Variant
Set rng = Table1.ListObjects("table").Range
Set dic = CreateObject("Scripting.Dictionary")
For Each cell In Range("table[column]")
dic(cell.Value) = 0
Next cell
For Each key In dic.keys
rng.AutoFilter 5, key
Set wb_new = Workbooks.Add
rng.SpecialCells(xlCellTypeVisible).Copy wb_new.Worksheets(1).Range("A1")
wb_new.Close True, ThisWorkbook.Path & "\" & key & ".xlsx"
Next key
End Sub
Now I want to do the exact same in Python but I am lost with using a for loop
to loop through a dictionary of the unique values in [employee]. I have no troubles doing it manually for every employee by creating a dedicated dataframe for each unique value, filtering it like df.employee == "a"
and saving it to_excel()
but doing it dynamically is something different.
So far I have the obvious:
import pandas as pd
file = "filename.csv"
df = pd.read_csv(file)
dict = dict(df["employee"].unique())
CodePudding user response:
Use pandas.groupby
and iterate through them:
groupobj = df.groupby('employee')
for name,group in groupobj:
group.to_excel(f'./{name}.xlsx')