Loop and fill values randomly (string) in pandas-CodePudding

I've a dataset which has columns named "dish_liked" it has values like based on the corresponding columns which is named "rest_type" and it has input like "Cafe" "Quick Bites" "Delivery" etc

Output of "dish_liked" columns when corresponding column "rest_type" values = "Quick Bites"

Waffles                                                                                                                  43
Nutella Pancakes                                                                                                         17
Donut, Coffee                                                                                                            14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel            13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches                            12

Now I've Nan values in the columns "dish_liked" and I want to fill them on the basis of corresponding column "rest_type".

My logic is I'll fetch the top 5 values (string) from "dish_liked" on the basis of "rest_type" and fill randomly.

eg : new_df.loc[new_df['rest_type'].isin(['Dessert Parlor']) , 'dish_liked'].value_counts()[0:5]

Waffles                                                                                                                  43
Nutella Pancakes                                                                                                         17
Donut, Coffee                                                                                                            14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel            13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches                            12

Now if the dish liked column has Nan value and it's corresponding column "rest_type" values = "Dessert Parlor" or "Cafe" etc. I want to fill the these upper top 5 values (string) on these Nan Values.

How can I do that ? Sorry if it sound confusing. Thanks in advance

CodePudding user response：

You better using map to do the job.

df['dish_liked'] = df.rest_type.map({'Cafe':['value1','value2',...], 'Dessert Parlor':['value1','value2',...],...}

To make it automate you will need to create a dictionary for it, for instance.

def build_di(k):
    di = {k: df[df.rest_type == k]['dish_liked'].value_counts()[:5]}
    return di

i = df.rest_type.unique()
di = build_di(i[0])
for x in i[1:]:
    di.update(build_di(x))

df['dish_liked'] = df['dish_liked'].fillna(df.rest_type.map(di))

I am not sure if it will work out of the box because I coded it without creating a dataset, just from a logic point of view. Though the logic is there. Next time just build a dummy dataset as an example to make it easier to people help you.

CodePudding user response：

Thanks. But is there any simple way to fill all the Nan values randomly in "dish_liked" columns if the "rest_type" == "Cafe" with these 5 strings

Friendly Staff                                                                                        21
Burgers, Coffee, Waffles, Mocktails, Pasta, Brownie Chocolate, Chicken Salami                         21
Burgers, Pasta, Chocolate Mousse, Potato Wedges, Cup Cake, Cheesy Fries, Peri Peri Chicken            21
Burgers, Coffee, Cappuccino, Barbeque Burger, Sandwiches, Spinach Pasta, Sandwich                     20
Bannoffee Pie, Pasta, Sandwiches, Salsa, Sandwich, Salads, Pita Bread                                 19

I'm quite a newbie