I've a dataset which has columns named "dish_liked" it has values like based on the corresponding columns which is named "rest_type" and it has input like "Cafe" "Quick Bites" "Delivery" etc
Output of "dish_liked" columns when corresponding column "rest_type" values = "Quick Bites"
Waffles 43
Nutella Pancakes 17
Donut, Coffee 14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel 13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches 12
Now I've Nan values in the columns "dish_liked" and I want to fill them on the basis of corresponding column "rest_type".
My logic is I'll fetch the top 5 values (string) from "dish_liked" on the basis of "rest_type" and fill randomly.
eg : new_df.loc[new_df['rest_type'].isin(['Dessert Parlor']) , 'dish_liked'].value_counts()[0:5]
Waffles 43
Nutella Pancakes 17
Donut, Coffee 14
Apple Pie, Mascarpone Cheese, Nolen Gurer Ice Cream, Paan Ice Cream, Nolen Gur, Gur Ice Cream, Salted Caramel 13
Coffee, Berryblast, Nachos, Chocolate Waffles, Nutella Waffle, Chocolate Overload, Sandwiches 12
Now if the dish liked column has Nan value and it's corresponding column "rest_type" values = "Dessert Parlor" or "Cafe" etc. I want to fill the these upper top 5 values (string) on these Nan Values.
How can I do that ? Sorry if it sound confusing. Thanks in advance
CodePudding user response:
You better using map to do the job.
df['dish_liked'] = df.rest_type.map({'Cafe':['value1','value2',...], 'Dessert Parlor':['value1','value2',...],...}
To make it automate you will need to create a dictionary for it, for instance.
def build_di(k):
di = {k: df[df.rest_type == k]['dish_liked'].value_counts()[:5]}
return di
i = df.rest_type.unique()
di = build_di(i[0])
for x in i[1:]:
di.update(build_di(x))
df['dish_liked'] = df['dish_liked'].fillna(df.rest_type.map(di))
I am not sure if it will work out of the box because I coded it without creating a dataset, just from a logic point of view. Though the logic is there. Next time just build a dummy dataset as an example to make it easier to people help you.
CodePudding user response:
Thanks. But is there any simple way to fill all the Nan values randomly in "dish_liked" columns if the "rest_type" == "Cafe" with these 5 strings
Friendly Staff 21
Burgers, Coffee, Waffles, Mocktails, Pasta, Brownie Chocolate, Chicken Salami 21
Burgers, Pasta, Chocolate Mousse, Potato Wedges, Cup Cake, Cheesy Fries, Peri Peri Chicken 21
Burgers, Coffee, Cappuccino, Barbeque Burger, Sandwiches, Spinach Pasta, Sandwich 20
Bannoffee Pie, Pasta, Sandwiches, Salsa, Sandwich, Salads, Pita Bread 19
I'm quite a newbie