I have a data frame having a huge list of all invoices. I need to split the data frame based on the column "invoice numbers". Total invoices list is up to 230,000 however the unique list of invoice are 138 (i.e for eg invoice "A", there might be 30 products for an invoice) and each such invoices should be split into a different dataset/data frame and extracted to excel/csv.
I tried the below code,
#List of invoice dataframe
Invoices = pd.read_csv('D:/Invoice list.csv')
Invoices.shape
# Output = (230265, 25)
#List unique invoices (It is just an unique taken from the 'Invoice number' column)
List = pd.read_csv('D:List.csv')
List.shape
# Output = (138, 1)
Inv_1 = Invoices[Invoices['Invoice number'] == 'TS-AVRB-Jan22-10']
However that will help to extract only 1 invoice number at a time, is it possible to do for all the 138 invoices?
I tried using For loop as below,
for i in List:
Inv = Invoices[Invoices['Invoice number'] == 'i']
Inv.to_csv('i.csv')
I get an output of a blank CSV file and not individual 138 CSV files having a list of invoices
Sample CSV file link having sample dataset is provided in the below link Link : https://drive.google.com/drive/folders/1B6IPsMAhXh9RvMlN-juFSTAJipuDYuZ-?usp=sharing
Help would be much appreciated!
CodePudding user response:
There are a few syntax errors in the code. You have to pass the i
in loop instead of string 'i'
. Also for the filename, you have to pass the invoice number in the string to prevent overwriting the file with the same name. The full code would be as followed
#List of invoice dataframe
Invoices = pd.read_csv('D:/Invoice list.csv')
print(Invoices.shape)
# Output = (230265, 25)
#List unique invoices (It is just an unique taken from the 'Invoice number' column)
invoice_num_list = pd.read_csv('D:/List.csv')
print(invoice_num_list.shape)
# Output = (138, 1)
for invoice_num in invoice_num_list:
Inv = Invoices[Invoices['Invoice number'] == invoice_num]
Inv.to_csv(f'{invoice_num}.csv')