Home > Back-end >  Create a function to multiply rows in dataframe based on df.mask or df.loc condition
Create a function to multiply rows in dataframe based on df.mask or df.loc condition

Time:08-31

I would like to create a function to do this...so I don't have to retype this several times based on different column names. I have a data frame that looks like this (except with 1500 rows and a few extra columns):

Application Name Device Type DAU MAU Downloads
App 3 iOS 5000 50000 100000
App 3 Android 4700 90000 120000
App 7 iOS 12000 45000 77890
App 7 Android 17000 60000 66000

My goal is to turn the following code below into a function. I need to be able to multiply every row where the device type is iOS by one number, and every row that the Device Type is Android by a different number.

Then I'll need to do the same for MAU, Downloads and a few other columns. So I'd like to make this into a function to reduce repetitive code.

df["Guess"]=pd.NaT
df["Guess"]=df['Guess'].mask(df['Device Type']=='iOS', df['DAU']*0.333)
df["Guess"]=df['Guess'].mask(df['Device Type']=='Android', df['DAU']*0.312)
    

I also know I could use something like:

df.loc[df['Device Type'] == 'iOS']

But now sure which is the best way to deal with performing some action on a data frame based on a value in a row.

CodePudding user response:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Application Name': ['App 3', 'App 3', 'App 7', 'App 7'],
                   'Device Type': ['iOS', 'Android', 'iOS', 'Android'],
                   'DAU': [5000, 4700, 12000, 17000],
                   'MAU': [50000, 90000, 45000, 60000],
                   'Downloads': [100000, 120000, 77890, 66000]})
print(df)

  Application Name Device Type    DAU    MAU  Downloads
0            App 3         iOS   5000  50000     100000
1            App 3     Android   4700  90000     120000
2            App 7         iOS  12000  45000      77890
3            App 7     Android  17000  60000      66000

FUNCTION

def multiplier(data, cols, ios_multi, android_multi):
    multiply_series = pd.Series(np.where(data['Device Type'] == 'iOS', ios_multi, android_multi))
    data[cols] = data[cols].mul(multiply_series, axis=0)
    return data

TESTING

df = multiplier(df, ['DAU', 'MAU', 'Downloads'], 0.333, 0.312)
print(df)

  Application Name Device Type     DAU      MAU  Downloads
0            App 3         iOS  1665.0  16650.0   33300.00
1            App 3     Android  1466.4  28080.0   37440.00
2            App 7         iOS  3996.0  14985.0   25937.37
3            App 7     Android  5304.0  18720.0   20592.00

CodePudding user response:

please see below a simplistic example:

import pandas as pd
import numpy as np

df = pd.read_clipboard()
df["Guess"] = 1 # example column for multiplication

# create the list of devices in question
# devices = ["iOs", "Adndroid", "Linux", "Mechanic"]
# or
devices = df["Device Type"].unique().tolist()


def mult_by_val(col_name, device_type, target_col):
    # function where: col_name is column name / table header of the device type
    # devicee_type: actual string to be matched/found in the col_name
    # target_col: column name which values will get multiplied by value in variable "mult"
    
    android_mult = 0.312
    iOS_mult = 0.333
    mult = 0
    if device_type == "Android":
        mult = android_mult
    else:
        mult = iOS_mult
    df[target_col] = np.where(df[col_name] == device_type, df[target_col]*mult, df[target_col])



for item in devices:
    mult_by_val("Device Type", "iOS", "Guess")
  • Related