I have to work on a flat file (size > 500 Mo) and I need to create to split file on one criterion. My original file as this structure (simplified): JournalCode|JournalLib|EcritureNum|EcritureDate|CompteNum|
I need to create to file depending on the first digit from 'CompteNum'.
I have started my code as well
import sys
import pandas as pd
import numpy as np
import datetime
C_FILE_SEP = "|"
def main(fic):
pd.options.display.float_format = '{:,.2f}'.format
FileFec = pd.read_csv(fic, C_FILE_SEP, encoding= 'unicode_escape')
It seems ok, my concern is to create my 2 files based on criteria. I have tried with unsuccess.
TargetFec = 'Target_' fic datetime.datetime.now().strftime("%Y%m%d-%H%M%S") '.txt'
target = open(TargetFec, 'w')
FileFec = FileFec.astype(convert_dict)
for row in FileFec.iterrows():
Fec_Cpt = str(FileFec['CompteNum'])
nb = len(Fec_Cpt)
if (nb > 7):
target.write(str(row))
target.close()
the result of my target file is not like I expected:
(0, JournalCode OUVERT
JournalLib JOURNAL D'OUVERTURE
EcritureNum XXXXXXXXXX
EcritureDate 20190101
CompteNum 101300
CompteLib CAPITAL SOUSCRIT
CompAuxNum
CompAuxLib
PieceRef XXXXXXXXXX
PieceDate 20190101
EcritureLib A NOUVEAU
Debit 000000000000,00
Credit 000038188458,00
EcritureLet NaN
DateLet NaN
ValidDate 20190101
Montantdevise
Idevise
CodeEtbt 100
Unnamed: 19 NaN
And I expected to obtain line into my target file when CompteNum(0:1) > 7
I have read many posts for 2 days, please some help will be perfect.
There is a sample of my data available here
Philippe
CodePudding user response:
As I understand it, you want to filter the imported dataframe according to some criteria. You can work directly on the pandas you imported. Look:
# criteria:
verify = df['CompteNum'].apply(lambda number: len(str(number)) > 7)
# filtering the dataframe based on the given criteria:
df[verify] # meets the criteria
df[~verify] # does not meet the criteria
# saving the dataframes:
df[verify].to_csv('<your path>/meets-criterios.csv')
df[~verify].to_csv('<your path>/not-meets-criterios.csv')
Once you have the filtered dataframes, you can save them or convert them to other objects, such as dictionaries.