I would like to create a function to do this...so I don't have to retype this several times based on different column names. I have a data frame that looks like this (except with 1500 rows and a few extra columns):
Application Name | Device Type | DAU | MAU | Downloads |
---|---|---|---|---|
App 3 | iOS | 5000 | 50000 | 100000 |
App 3 | Android | 4700 | 90000 | 120000 |
App 7 | iOS | 12000 | 45000 | 77890 |
App 7 | Android | 17000 | 60000 | 66000 |
My goal is to turn the following code below into a function. I need to be able to multiply every row where the device type is iOS by one number, and every row that the Device Type is Android by a different number.
Then I'll need to do the same for MAU, Downloads and a few other columns. So I'd like to make this into a function to reduce repetitive code.
df["Guess"]=pd.NaT
df["Guess"]=df['Guess'].mask(df['Device Type']=='iOS', df['DAU']*0.333)
df["Guess"]=df['Guess'].mask(df['Device Type']=='Android', df['DAU']*0.312)
I also know I could use something like:
df.loc[df['Device Type'] == 'iOS']
But now sure which is the best way to deal with performing some action on a data frame based on a value in a row.
CodePudding user response:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Application Name': ['App 3', 'App 3', 'App 7', 'App 7'],
'Device Type': ['iOS', 'Android', 'iOS', 'Android'],
'DAU': [5000, 4700, 12000, 17000],
'MAU': [50000, 90000, 45000, 60000],
'Downloads': [100000, 120000, 77890, 66000]})
print(df)
Application Name Device Type DAU MAU Downloads
0 App 3 iOS 5000 50000 100000
1 App 3 Android 4700 90000 120000
2 App 7 iOS 12000 45000 77890
3 App 7 Android 17000 60000 66000
FUNCTION
def multiplier(data, cols, ios_multi, android_multi):
multiply_series = pd.Series(np.where(data['Device Type'] == 'iOS', ios_multi, android_multi))
data[cols] = data[cols].mul(multiply_series, axis=0)
return data
TESTING
df = multiplier(df, ['DAU', 'MAU', 'Downloads'], 0.333, 0.312)
print(df)
Application Name Device Type DAU MAU Downloads
0 App 3 iOS 1665.0 16650.0 33300.00
1 App 3 Android 1466.4 28080.0 37440.00
2 App 7 iOS 3996.0 14985.0 25937.37
3 App 7 Android 5304.0 18720.0 20592.00
CodePudding user response:
please see below a simplistic example:
import pandas as pd
import numpy as np
df = pd.read_clipboard()
df["Guess"] = 1 # example column for multiplication
# create the list of devices in question
# devices = ["iOs", "Adndroid", "Linux", "Mechanic"]
# or
devices = df["Device Type"].unique().tolist()
def mult_by_val(col_name, device_type, target_col):
# function where: col_name is column name / table header of the device type
# devicee_type: actual string to be matched/found in the col_name
# target_col: column name which values will get multiplied by value in variable "mult"
android_mult = 0.312
iOS_mult = 0.333
mult = 0
if device_type == "Android":
mult = android_mult
else:
mult = iOS_mult
df[target_col] = np.where(df[col_name] == device_type, df[target_col]*mult, df[target_col])
for item in devices:
mult_by_val("Device Type", "iOS", "Guess")