Basically I have a dataframe:
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Amount'])
and I want to write a function that will apply a percentage change to certain rows based on the values I give it:
def function(x, pct):
if df['Name'] == x:
df['Amount'] = df['Amount'] - (df['Amount'] * pct), df['Amount']
else:
df['Amount'] = df['Amount']
return df
I know that I need to reference the data frame somewhere in the function but I'm struggling to figure out how to do it.
CodePudding user response:
You need to use apply across the series along columns axis like below:
def function(s, x, pct):
if s['Name'] == x:
s['Amount'] = s['Amount'] - (s['Amount'] * pct), s['Amount']
else:
s['Amount'] = s['Amount']
return s
and then use it for example 'tom' and 0.1
df.apply(lambda s: function(s, 'tom', 0.1), axis=1)
output of this is:
Name Amount
0 tom (9.0, 10)
1 nick 15
2 juli 14
Note : you can do better than this, if you can define some sort of datastructure like dict and then using it in apply.
CodePudding user response:
Several ways how to accomplish this. Based on your own attempt:
def f(x, name, pct):
if x['Name'] == name:
return x['Amount']*(1-pct)
return x['Amount']
df['Amount'] = df.apply(lambda x: f(x, 'tom', 0.25), axis=1)
df
Name Amount
0 tom 7.5
1 nick 15.0
2 juli 14.0
Or using np.where
like so:
import numpy as np
pct = 0.25
df['Amount'] = np.where(df['Name'] == 'tom', (1-pct)*df['Amount'], df['Amount'])
Yet another option:
df = pd.DataFrame(data, columns=['Name', 'Amount'])
df.loc[df['Name'] == 'tom', 'Amount'] = df.loc[df['Name'] == 'tom', 'Amount']*(1-pct)
Will all get you the same output.
CodePudding user response:
Use boolean indexing:
names = 'tom'
pct = 0.2
df.loc[df['Name'].eq(name), 'Amount'] *= (1-pct)
with a list:
names = ['tom']
pct = 0.2
df.loc[df['Name'].isin(names), 'Amount'] *= (1-pct)
output:
Name Amount
0 tom 8
1 nick 15
2 juli 14
CodePudding user response:
def function(dataframe, name, pct_change):
dataframe = dataframe.copy()
dataframe.loc[dataframe.Name==name, "Amount"]*=(1-pct_change)
return dataframe
#function call example
function(df, "nick", .5)
#function call output
#
# Name Amount
#0 tom 10.0
#1 nick 7.5
#2 juli 14.0
Note that the the function does not modify inplace df
, but only return a modified copy of it. To replace the older dataframe with the new one:
df = function(df, "nick", .5)