I have json stored in S3. Sometimes units
is stored as a string, sometimes it's stored as an integer. Unfortunately, this was a bug, and I now have billions of records with mixmatched datatypes in the source json.
example:
{
"other_stuff": "stuff"
"units": 2,
{
{
"other_stuff": "stuff"
"units": "2",
{
I want to dynamically determine if it's a string / integer, and then target it as an integer into AWS Redshift.
If my mappings is: ("units", "string", "units", "int")
, only the "string" values will be converted correctly. If i do ("units", "int", "units", "int")
then it's the opposite, only the "integer" ones will work.
How do I dynamically cast the source record, and always load it as a integer into Redshift. You can assume, that all values are numeric, not null, and the attribute is guaranteed to be there.
CodePudding user response:
You can use the ResolveChoices function from Glue.
resolved_choices = df.resolveChoice(
specs=[
('units', 'cast:int')
]
)