Home > Enterprise >  Struggles with converting a DBF file to Pandas DataFrame
Struggles with converting a DBF file to Pandas DataFrame

Time:12-13

I'm attempting to work with the Canadian radio station DBF files made public here: https://sms-sgs.ic.gc.ca/eic/site/sms-sgs-prod.nsf/eng/h_00015.html

I'd like to read specifically the fmstatio.dbf file into a Pandas DataFrame. I've tried the two commonly recommended DBF packages in Python.

When using simpledbf (https://pypi.org/project/simpledbf/), I only get the column names when using the dbf.to_dataframe() function.

I also tried dbf on pypi (https://pypi.org/project/dbf/). I'm able to read the DBF file into a table:

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf')
table.open(dbf.READ_ONLY)
print(table)
table.close()

And get the following information on the table:

   Table:         /datadrive/canada/fmstatio.dbf
    Type:          dBase III Plus
    Codepage:      ascii (plain ol' ascii)
    Status:        DbfStatus.READ_ONLY
    Last updated:  1921-12-07
    Record count:  8428
    Field count:   37
    Record length: 221 

But when trying to convert into a DataFrame, I am unsuccessful:

oh_canada = pd.DataFrame(table)
table.close()

Error I receive:

    data = fielddef[CLASS](decoder(data)[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 4: ordinal not in range(128)

Might anyone have insight as to the best way to go about working with this kind of DBF file in Pandas? Many thanks in advance.

CodePudding user response:

The table says it is "plain old ascii", but it lies. It contains "e with acute accent", which is not surprising given the French content in Canadian databases. To work around this, you need to override the codepage:

table = dbf.Table(filename='/datadrive/canada/fmstatio.dbf',codepage=3)

"3" means the default Windows code page, CP1252. With that, I am able to read the file.

I'm still not sure pandas can import the format it supplies as an iterator. You may need to use export to convert this to lists.

  • Related