Home > Net >  How to enforce Decimal dtype in pandas DataFrame
How to enforce Decimal dtype in pandas DataFrame

Time:12-06

How can I stricly enforce a dtype Decimal in a pandas DataFrame?

To clarify: I am not looking for weak workarounds, such as rounding every time I write to or read from a column (and hope that no other operations happend elsewhere that might lead to unwanted results).

I really want to be 100% sure that whatever is written in that column, no matter where it might have come from, will always have exactly 2 significant digits behind the decimal point, end of story. And if a user intends to write something that's not in agreement, the whole thing should blow up (either producing a TypeError or ValueError). --> To avoid theoretical dicussions and motivate the usage a bit: I am dealing with a trading system, that's why saving anything other than 2 decimal points in that frame would be a hard error, always.

I have tried to assign a dtype, but without success:

from decimal import Decimal
df[col].astype(Decimal)

Pydantic comes to mind: but if I bake the df into a class (say class MyDfType), then do I need to write my own setter/getter functions into MyDfType for the contained dataframe (MyDFType().df) ensure that all values to/from certain cols are manually enforced to be Decimal?

CodePudding user response:

To strictly enforce the Decimal data type for a column in a pandas DataFrame, you can use the pd.DataFrame.astype() method to convert the column to the Decimal type.

import pandas as pd
from decimal import Decimal

# create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4.0, 5.0, 6.0]})

# convert column 'B' to the Decimal type
df['B'] = df['B'].astype(Decimal)

# this will raise an error because the value in the first row cannot be converted to a Decimal
df['B'][0] = 'abc'

This method will raise an error if any of the values in the column cannot be converted to the specified data type.

CodePudding user response:

importing pandas as pd

import pandas as pd

importing numpy as np

import numpy as np

setting the seed to re-create the dataframe

np.random.seed(25)

Creating a 5 * 4 dataframe

df = pd.DataFrame(np.random.random([5, 4]), columns =["A", "B", "C", "D"])

Print the dataframe

df

  • Related