Home > Net >  Want to cast pandas column data type to string, if its having objectid - dynamically
Want to cast pandas column data type to string, if its having objectid - dynamically

Time:02-19

I have a scenario all my Mongodb collections are having an objectId column. I am reading collections using pymongo and converting them into pandas dataframe.

When I try to write as parquet using AWS lambda wrangler library or Pyarrow is failing with with type ObjectId: did not recognize Python value type when inferring an Arrow data type"

Is there a way to convert objectId to string dynamically, if the column type is Objectid?

myresult = collection.find(query)
 wr.s3.to_parquet(df1,path="s3://abcd/parquet.parquet")

Sample mongo data Schema

_id:objectID
id:string
createTimestamp: timestamp
updateTimestamp:timestamp
deleteTimestamp:timestamp

save as Parquet to Schema

_id:String
id:string
createTimestamp: timestamp
updateTimestamp:timestamp
deleteTimestamp:timestamp

CodePudding user response:

You can try to convert the _id column to string before saving it to parquet.

wr.s3.to_parquet(
  df1.astype({"_id": str}),
  path="s3://abcd/parquet.parquet")
  • Related