Home > Net >  Find the top three free apps based off chosen column in python
Find the top three free apps based off chosen column in python

Time:04-19

I want to write a function that finds the top three Free apps based on 'Reviews', 'Rating', 'Installs'.

It should return a data frame that has the category and app for the first two columns, and one of Rating, Installs, and Reviews as the third column

right now my code looks like this:

input:

def topthree(column):
    Googleapps_df["Reviews"] = pd.to_numeric(Googleapps_df["Reviews"])
    Googleapps_df["Installs"] = pd.to_numeric(Googleapps_df["Installs"])
    Googleapps_df["Rating"] = pd.to_numeric(Googleapps_df["Rating"])
    topthree = Googleapps_df.groupby('Type')[column].nlargest(3)
    return topthree

topthree('Reviews')

output:

Type      
Free  1879    44893888.0
      1670    44891723.0
      1704    44891723.0
Paid  4034      408292.0
      7417      348962.0
      8860      190086.0
Name: Reviews, dtype: float64

How do I add an app column in between the type and numbers, and how do I get rid of the 3 Paid number so it looks like this:

Type      App
    Free  "App_name"   1879    44893888.0
          "App_name"   1670    44891723.0
          "App_name"   1704    44891723.0
    Name: Reviews, dtype: float64

CodePudding user response:

You can use a filter to only show your results of where Type == 'Free' either after the .groupby('Type') or when you call the function.

Additionally, you can append your table with Googleapps_df['App'] since currently you are just showing the grouby tree that you have with Type and column.

CodePudding user response:

As you do not need the second type first filter that then find the max value of each of the three columns. Use:

import pandas as pd
df = pd.DataFrame({'name':['a','b','c','d','f','g','h','i'],'type':[1,2,2,1,1,1,2,2],'reviews':[500,4,350,102,35,40,4,350],'ratings':[8,45,822,23,74,100,4,350],'installs':[4,5,6,7,81,1,4,350]})
df1=df[df['type']==1]
n1 = df1[(df1['reviews']==df1['reviews'].max())]['name'].values[0]
n2 = df1[(df1['ratings']==df1['ratings'].max())]['name'].values[0]
n3 = df1[(df1['installs']==df1['installs'].max())]['name'].values[0]
out = pd.DataFrame({'name': [n1,n2,n3], 'values': vals})

Input:

enter image description here

Output:

enter image description here

CodePudding user response:

Change:

topthree = Googleapps_df.groupby('Type')[column].nlargest(3)

to GroupBy.head with sorting by both columns by DataFrame.sort_values:

(topthree = Googleapps_df.sort_values(['Type',column], ascending=True, False)
                         .groupby('Type')[column]
                         .head(3))

Or convert column App to index:

(topthree = Googleapps_df.set_index('App')
                         .groupby('Type')[column]
                         .nlargest(3))

If need Series from first solution set it after groupby.head:

topthree = (Googleapps_df.sort_values(['Type',column], ascending=True, False)
                         .groupby('Type')[column]
                         .head(3)
                         .set_index(['Type', 'App'])[column])
  • Related