I am trying to read in a csv as a pandas dataframe and turn it into a list of tuples which I am currently doing using to_records(). However, one of my column which is bytes keep getting turned into a string, for example, goes from b'\x00\x01\x02\x03\x04' to "b'\x00\x01\x02\x03\x04'". I want pandas to keep the original formatting, is there any way to achieve this?
This is what the dataframe looks like:
col1 | col2 |
---|---|
True | b'\\x00\\x01\\x02\\x03\\x04' |
False | b'\\x05\\x06\\x07\\x08\\x09' |
But when I turn it into a list of tuples using to_records it looks like this: [(True, "b'\\x00\\x01\\x02\\x03\\x04'"), (False, "b'\\x05\\x06\\x07\\x08\\x09'")]
^ As you can see the bytes get turned into a string.
CodePudding user response:
As @mozway suggests, your values are already strings look like bytes. Try to use pd.eval
:
>>> df.assign(col2=pd.eval(df['col2'])).to_records()
rec.array([(0, True, b'\\x00\\x01\\x02\\x03\\x04'),
(1, False, b'\\x05\\x06\\x07\\x08\\x09')],
dtype=[('index', '<i8'), ('col1', '?'), ('col2', 'O')])
>>> df.to_records()
rec.array([(0, True, "b'\\\\x00\\\\x01\\\\x02\\\\x03\\\\x04'"),
(1, False, "b'\\\\x05\\\\x06\\\\x07\\\\x08\\\\x09'")],
dtype=[('index', '<i8'), ('col1', '?'), ('col2', 'O')])