I am trying to use pydantic classes to represent records of a CSV. Some fields in this CSV represent things like numbers, dates, encoded lists that are better handled as such. So I assign the appropriate type to the coresponding pydantic field and rely on pydantic to cast the string to the type. Unfortunately this fails for lists.
from typing import List
import csv
from pydantic import BaseModel
class Foo(BaseModel):
a: int
b: List[str]
c: str
# Write class to CSV
x = Foo(a=1, b=["hello", "world"], c="foo")
with open("/tmp/test.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=x.dict().keys())
writer.writeheader()
writer.writerow(x.dict())
# Try to load the class back from CSV
with open("/tmp/test.csv") as f:
reader = csv.DictReader(f)
y = Foo(**next(reader))
I expect that y
would be instance with the same values as x
, but instead it crashes with ListError
. This code does succeed in outputting /tmp/test.csv
, and its contents are:
a,b,c
1,"['hello', 'world']",foo
How can I solve this problem?
CodePudding user response:
So, here is how I would do what you want to do:
from typing import List
from pydantic import BaseModel, Field, validator
import json
class Foo(BaseModel):
bar: int = None
baz: List[pydantic.StrictStr] = Field(default_factory=list)
@validator('baz', pre=True)
def _maybe_json(cls, v):
if isinstance(v, str):
try:
return json.loads(v)
except json.JSONDecodeError as e:
raise ValueError("not valid JSON") from e
return v
def to_csv_row(self):
row = self.dict()
row["baz"] = json.dumps(row["baz"])
return row
Note how StrictStr
handles this:
In [4]: Foo(baz='["a"]')
Out[4]: Foo(bar=None, baz=['a'])
In [5]: Foo(baz='[1]')
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 Foo(baz='[1]')
File ~/miniconda3/envs/maze-etl/lib/python3.9/site-packages/pydantic/main.py:331, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Foo
baz -> 0
str type expected (type=type_error.str)
But if you don't want that just use List[str]
And just use it like:
In [10]: foo = Foo(bar=1, baz=['a','b','c'])
In [11]: foo
Out[11]: Foo(bar=1, baz=['a', 'b', 'c'])
In [12]: foo.to_csv_row()
Out[12]: {'bar': 1, 'baz': '["a", "b", "c"]'}
CodePudding user response:
The solution I found was to create a validator that checks the value being passed, and if it's a string, tries to eval it to a Python list.
class Foo(BaseModel):
a: int
b: List[str]
c: str
@validator("b", pre=True)
def eval_list(cls, val):
if isinstance(val, List):
return val
else:
return ast.literal_eval(val)
This can of course potentially allow people to inject Python code via the CSV, and it is possible to construct lists which cannot be reconstructed from their string representation. In my case, the CSVs are all created by me and the lists are simple lists of string, so this limitation is not a problem.