I'm attempting to work with the Canadian radio station DBF files made public here: https://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00015.html
I'd like to read specifically the fmstatio.dbf file into a Pandas DataFrame. I've tried the two commonly recommended DBF packages in Python.
When using simpledbf (https://pypi.org/project/simpledbf/), I only get the column names when using the dbf.to_dataframe() function.
I also tried dbf on pypi (https://pypi.org/project/dbf/). I'm able to read the DBF file into a table:
table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf')
table.open(dbf.READ_ONLY)
print(table)
table.close()
And get the following information on the table:
Table: /datadrive/canada/fmstatio.dbf
Type: dBase III Plus
Codepage: ascii (plain ol' ascii)
Status: DbfStatus.READ_ONLY
Last updated: 1921-12-07
Record count: 8428
Field count: 37
Record length: 221
But when trying to convert into a DataFrame, I am unsuccessful:
oh_canada = pd.DataFrame(table)
table.close()
Error I receive:
data = fielddef[CLASS](decoder(data)[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)
Might anyone have insight as to the best way to go about working with this kind of DBF file in Pandas? Many thanks in advance.
CodePudding user response:
The table says it is "plain old ascii", but it lies. It contains "e with acute accent", which is not surprising given the French content in Canadian databases. To work around this, you need to override the codepage:
table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf',codepage=3)
"3" means the default Windows code page, CP1252. With that, I am able to read the file.
I'm still not sure pandas
can import the format it supplies as an iterator. You may need to use export
to convert this to lists.