I have the following dictionaries inside variables:
sk_channel_types = {"facebooknotification": 2,
"facebookmessenger": 9,
"onsitenotification": 3,
"pushnotification": 6,
"pushnotificationmessage": 6,
"lightbox": 4,
"onsitemessage": 7,
"mailmessage": 1}
sk_story_types = {"welcome": 7,
"rescue": 13,
"frequency": 4,
"abandoncart": 6,
"pricedrop": 16,
"manual": 5,
"searchbykeyword": 30,
"sazonality": 31,
"bestdayforpurchase": 28,
"pricechange": 32,
"availability": 33,
"toptrending": 1,
"toptrendingbycluster": 2,
"toptrendingwithpricelimit": 3,
"frequencyview": 4,
"manualnotification": 5,
"trending": 9,
"toptrendingbykeyword": 9}
And this is my current spark dataframe:
ID | StoryType | Type | StoryId |
---|---|---|---|
abcdefghijklmnopqrst | AbandonCart | MailMessage | 56465465456456456465 |
lçdkçlskdçlsdkçlskdç | ManualNotification | MailMessage | 60983099380938390833 |
uahuahuahauhauahuaha | ManualNotification | MailMessage | 49438093890484984949 |
sklçskçlskdkcnopeieo | ManualNotification | MailMessage | 93084098409840984098 |
2d5fe941380938098948 | ManualNotification | MailMessage | 49809380398094894844 |
9883jkjd3eu0dj0j3930 | ManualNotification | MailMessage | 636f50c9380938093893 |
I need to replace the StoryType and Type columns with their respective numbers, as per the variables, like this:
ID | StoryType | Type | StoryId |
---|---|---|---|
abcdefghijklmnopqrst | 6 | 1 | 56465465456456456465 |
lçdkçlskdçlsdkçlskdç | 5 | 1 | 60983099380938390833 |
uahuahuahauhauahuaha | 5 | 1 | 49438093890484984949 |
sklçskçlskdkcnopeieo | 5 | 1 | 93084098409840984098 |
2d5fe941380938098948 | 5 | 1 | 49809380398094894844 |
9883jkjd3eu0dj0j3930 | 5 | 1 | 636f50c9380938093893 |
How can I do this? Can I use a case with low? I'm new to Pyspark.
CodePudding user response:
Since the dictionaries are small the efficient way is to make them broadcasted dataset and join them to the dataset.