I need to save data from csv file to django models. The data comes from external api so I have no control on its structure. In my schema, I allowed all fields to be nullable.
This is my script
text = f"{path}/report.csv"
df = pd.read_csv(text)
row_iter = df.iterrows()
for index, row in row_iter:
rows = {key.replace("-", "_"): row.pop(key) for key in row.keys()}
# print(f"rows {rows}")
# default_values = {
# "amazon-order-id",merchant-order-id,purchase-date,last-updated-date,order-status,fulfillment-channel,sales-channel,order-channel,ship-service-level,product-name,sku,asin,item-status,quantity,currency,item-price,item-tax,shipping-price,shipping-tax,gift-wrap-price,gift-wrap-tax,item-promotion-discount,ship-promotion-discount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,is-business-order,purchase-order-number,price-designation,is-iba,order-invoice-type
# }
sb, created = Order.objects.update_or_create(
sellingpartner_customer=c,
amazon_order_id=rows["amazon_order_id"],
sku=rows["sku"],
asin=rows["asin"],
defaults={**rows},
)
However, since some of the csv fields has empty values, pandas will replace it with NaN
value, this is where django returns an error
django.db.utils.OperationalError: (1054, "Unknown column 'NaN' in 'field list'")
I tried replace empty values as empty string("")
df.fillna("", inplace=True)
But django will return an error for fields that are not charfields
django.core.exceptions.ValidationError: ['“” value must be a decimal number.']
My question is, how do you handle empty values from csv file in pandas so for example if the field type is boolean, pandas will just replace empty value with boolean False for django boolean type fields, 0
for empty decimal types, and just blank for empty charfields, etc ?
CodePudding user response:
Nullable Django models won't take np.nan
or other Pandas-compatible not-a-number
objects. It expects taking None
as in stock Python. When you have nan
values, before you save them to Django, just replace them with None
to avoid the validation error.