I am converting Mbox files to Sqlite db. I do not arrive to encode the db file into utf-8.
The Python console displays the following message when converting to db:
Error binding parameter 1 - probably unsupported type.
When I visualize my data on DB Browser for SQlite, special characters don't appear and the � symbol shows up instead.
I first convert .text files to Mbox files with the following function:
def makeMBox(fIn,fOut):
if not os.path.exists(fIn):
return False
if os.path.exists(fOut):
return False
out = open(fOut,"w")
lineNum = 0
# detect encoding
readsource = open(fIn,'rt').__next__
#fInCodec = tokenize.detect_encoding(readsource)[0]
fInCodec = 'UTF-8'
for line in open(fIn,'rt', encoding=fInCodec, errors="replace"):
if line.find("From ") == 0:
if lineNum != 0:
out.write("\n")
lineNum =1
line = line.replace(" at ", "@")
out.write(line)
out.close()
return True
Then, I convert to sqlite db:
for k in dates:
db = sqlite_utils.Database("Courriels_Sqlite/Echanges_Discussion.db")
mbox = mailbox.mbox("Courriels_MBox/" k ".mbox")
def to_insert():
for message in mbox.values():
Dionyversite = dict(message.items())
Dionyversite["payload"] = message.get_payload()
yield Dionyversite
try:
db["Dionyversite"].upsert_all(to_insert(), alter = True, pk = "Message-ID")
except sql.InterfaceError as e:
print(e)
Thank you for your help.
CodePudding user response:
I found how to fix it:
def to_insert():
for message in mbox.values():
Dionyversite = dict(message.items())
Dionyversite["payload"] = message.get_payload(decode = True)
yield Dionyversite
``
As you can see, I add `decode = True` inside `get_payload`of the `to_insert`function.