Pandas to_records() throws an error while numpy.array is behaving like expected.
data = [('myID', 5), ('myID', 10)]
myDtype = numpy.dtype([('myID', numpy.str_,4),
('length', numpy.uint16)])
Working:
arr = numpy.array(data, dtype=myDtype)
output: [('myID', 5) ('myID', 10)]
This is not working
df = pd.DataFrame(data)
df = df.to_records(index=False, column_dtypes=myDtype)
ValueError: invalid literal for int() with base 10: 'myID'
What I am doing wroing with pandas to_records()?
CodePudding user response:
Ok so from what I understand, the way you wrote your variable myDtype
isn't compatible with the column names your dataframe has.
Your current dataframe columns are int values of 0 and 1, causing your error (trying to match the int 0
to your naming "myID"
).
(Not entirely sure about that one so someone might want to complement, I'll edit the answer.)
I was able to remove the error by referring the column_dtypes with a dictionary :
data = [("myID", 5), ("myID", 10)]
myDtype = numpy.dtype([('myID', numpy.str_, 4),
('length', numpy.uint16)])
df = pd.DataFrame(data, columns=["myID", "length"])
df_records = df.to_records(index=False, column_dtypes={"myID": "<U4", "length": "<u2"})
With the following result :
rec.array([('myID', 5), ('myID', 10)],
dtype=[('myID', '<U4'), ('length', '<u2')])
CodePudding user response:
column_dtypes
argument in the to_records()
function of a pandas dataframe
expects a dict
datatype as its input. But you are passing myDtype
as the argument which is of type numpy.dtype
.
Try this, it should work -
df = pd.DataFrame(data)
df_rec = df.to_records(index = False, column_dtypes = myDtype.fields)
The output is -
>>> df_rec
rec.array([('myID', 5), ('myID', 10)],
dtype=[('0', 'O'), ('1', '<i8')])