I am using the python library, dbf, by Ethan Furman to convert a number of dbf files to csv. It works extremely well for that. I would like to further edit some of the fields during the conversion process but am unsure how to do it. Specifically, I would like to replace string fields that contain only 1 or more blanks with empty strings, (eg. " " replace with "") and date fields that contain "00000000" with empty strings "". I would very much appreciate it if someone could describe how to edit the fields and write out the updated records during the conversion process. Obviously, I could write a simple secondary script to edit the csv files output during conversion but I would like to do it all in one step if possible. Here is the code I am using to convert the files:
import csv
import dbf
import os
import sys
folder=sys.argv[1]
for dirpath, dirnames, filenames in os.walk(folder):
for filename in filenames:
if filename.endswith('.DBF'):
db=dbf.Table(filename, ignore_memos=True)
db.open()
csv_fn = filename[:-4] ".csv"
dbf.export(db, filename=csv_fn, format='csv', header=True)
CodePudding user response:
By default, when using a dbf table the data types returned are simple -- i.e. int
, str
, bool
, datetime.datetime
, etc. But you can make your own data types and have those used instead by specifying them in the default_data_types
parameter:
db = dbf.Table(
filename,
ignore_memos=True,
default_data_types={
'C': my_white_space_stripping_data_type,
'D': my_empty_date_str_data_type,
},
)
Fortunately, dbf
comes with four enhanced data types already:
Char
-- automatically strips trailing whitespace, and ignores trailing whitespace for comparisonsLogical
-- supportsTrue
,False
, andNone
(None
is returned when the field value is not true or false -- I've seen?
,' '
, and other weird garbage)Date
-- supports an empty date, such as00000000
, and displays them as''
DateTime
-- supports an empty date/time, and displays them as''
Typically, if you're using one of the enhanced data types you probably want them all, so instead of the dictionary you can just pass a string:
db = dbf.Table(
filename,
ignore_memos=True,
default_data_types='enhanced',
)
Now, when a csv file is exported, trailing white-space is dropped, and empty date fields become ''
.
Keep in mind that empty logical fields will become '?'
instead of ''
, so you may want the longer form of specifying a dict
to default_data_types
and only overriding C
and D
.