Home > Software engineering >  Filter table and copy to new file in Pandas (VBA-like) dynamically
Filter table and copy to new file in Pandas (VBA-like) dynamically

Time:09-16

I have a file (CSV) containing data that looks something like this in a table format:

ID employee date value1 value2
1 a 2022-01-01 123 456
2 b 2022-01-01 123 456
3 a 2022-01-01 123 456
4 c 2022-01-01 123 456
5 d 2022-01-01 123 456
6 b 2022-01-01 123 456
7 e 2022-01-01 123 456
8 e 2022-01-01 123 456

In Excel I created a macro with VBA that creates new workbooks for the table filtered by the employee column. The macro works dynamically by creating a dictionary of the unique values in column [employee] and a For Each loop that filters the original table by the key of the dictionary. In this example it would create 5 new workbooks (for employee "a", "b", "c", "d" and "e") displaying only their corresponding data. E.g. for employee a:

ID employee date value1 value2
1 a 2022-01-01 123 456
3 a 2022-01-01 123 456

VBA code:

Sub Filter_Copy()

Dim rng As Range
Dim wb_new As Workbook
Dim dic As Object
Dim cell AsSub Range
Dim key As Variant

Set rng = Table1.ListObjects("table").Range
Set dic = CreateObject("Scripting.Dictionary")

For Each cell In Range("table[column]")
    dic(cell.Value) = 0
Next cell

For Each key In dic.keys
    rng.AutoFilter 5, key
    Set wb_new = Workbooks.Add
    rng.SpecialCells(xlCellTypeVisible).Copy wb_new.Worksheets(1).Range("A1")
    wb_new.Close True, ThisWorkbook.Path & "\" & key & ".xlsx"
    
Next key

End Sub

Now I want to do the exact same in Python but I am lost with using a for loop to loop through a dictionary of the unique values in [employee]. I have no troubles doing it manually for every employee by creating a dedicated dataframe for each unique value, filtering it like df.employee == "a" and saving it to_excel() but doing it dynamically is something different.

So far I have the obvious:

import pandas as pd

file = "filename.csv"

df = pd.read_csv(file)

dict = dict(df["employee"].unique())

CodePudding user response:

Use pandas.groupby and iterate through them:

groupobj = df.groupby('employee')
for name,group in groupobj:
    group.to_excel(f'./{name}.xlsx')
  • Related