Home > Enterprise >  Cross dataframe with dictionary
Cross dataframe with dictionary

Time:02-24

I have the following dictionaries inside variables:

sk_channel_types = {"facebooknotification": 2,
                    "facebookmessenger": 9,
                    "onsitenotification": 3,
                    "pushnotification": 6,
                    "pushnotificationmessage": 6,
                    "lightbox": 4,
                    "onsitemessage": 7,
                    "mailmessage": 1}

sk_story_types = {"welcome": 7,
                  "rescue": 13,
                  "frequency": 4,
                  "abandoncart": 6,
                  "pricedrop": 16,
                  "manual": 5,
                  "searchbykeyword": 30,
                  "sazonality": 31,
                  "bestdayforpurchase": 28,
                  "pricechange": 32,
                  "availability": 33,
                  "toptrending": 1,
                  "toptrendingbycluster": 2,
                  "toptrendingwithpricelimit": 3,
                  "frequencyview": 4,
                  "manualnotification": 5,
                  "trending": 9,
                  "toptrendingbykeyword": 9}

And this is my current spark dataframe:

ID StoryType Type StoryId
abcdefghijklmnopqrst AbandonCart MailMessage 56465465456456456465
lçdkçlskdçlsdkçlskdç ManualNotification MailMessage 60983099380938390833
uahuahuahauhauahuaha ManualNotification MailMessage 49438093890484984949
sklçskçlskdkcnopeieo ManualNotification MailMessage 93084098409840984098
2d5fe941380938098948 ManualNotification MailMessage 49809380398094894844
9883jkjd3eu0dj0j3930 ManualNotification MailMessage 636f50c9380938093893

I need to replace the StoryType and Type columns with their respective numbers, as per the variables, like this:

ID StoryType Type StoryId
abcdefghijklmnopqrst 6 1 56465465456456456465
lçdkçlskdçlsdkçlskdç 5 1 60983099380938390833
uahuahuahauhauahuaha 5 1 49438093890484984949
sklçskçlskdkcnopeieo 5 1 93084098409840984098
2d5fe941380938098948 5 1 49809380398094894844
9883jkjd3eu0dj0j3930 5 1 636f50c9380938093893

How can I do this? Can I use a case with low? I'm new to Pyspark.

CodePudding user response:

Since the dictionaries are small the efficient way is to make them broadcasted dataset and join them to the dataset.

  • Related