How can I stricly enforce a dtype Decimal in a pandas DataFrame?
To clarify: I am not looking for weak workarounds, such as rounding every time I write to or read from a column (and hope that no other operations happend elsewhere that might lead to unwanted results).
I really want to be 100% sure that whatever is written in that column, no matter where it might have come from, will always have exactly 2 significant digits behind the decimal point, end of story. And if a user intends to write something that's not in agreement, the whole thing should blow up (either producing a TypeError or ValueError). --> To avoid theoretical dicussions and motivate the usage a bit: I am dealing with a trading system, that's why saving anything other than 2 decimal points in that frame would be a hard error, always.
I have tried to assign a dtype, but without success:
from decimal import Decimal
df[col].astype(Decimal)
Pydantic comes to mind: but if I bake the df into a class (say class MyDfType
), then do I need to write my own setter/getter functions into MyDfType
for the contained dataframe (MyDFType().df
) ensure that all values to/from certain cols are manually enforced to be Decimal?
CodePudding user response:
To strictly enforce the Decimal data type for a column in a pandas DataFrame, you can use the pd.DataFrame.astype() method to convert the column to the Decimal type.
import pandas as pd
from decimal import Decimal
# create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})
# convert column 'B' to the Decimal type
df['B'] = df['B'].astype(Decimal)
# this will raise an error because the value in the first row cannot be converted to a Decimal
df['B'][0] = 'abc'
This method will raise an error if any of the values in the column cannot be converted to the specified data type.
CodePudding user response:
importing pandas as pd
import pandas as pd
importing numpy as np
import numpy as np
setting the seed to re-create the dataframe
np.random.seed(25)
Creating a 5 * 4 dataframe
df = pd.DataFrame(np.random.random([5, 4]), columns =["A", "B", "C", "D"])
Print the dataframe
df