I have a pandas dataframe, df defined as follows:
df = pd.DataFrame({'Year':[1,2,3,...],'A':[2000,4000,6000,...],'B':[200,400,600,...]})
where 'Year' goes from 1-40 but it can be any integer n. I want to calculate a new column as follows
df['C'] = 0.06*(df.A df.B]
However I wish to calculate column C only for years
years = [3,5,7,10]
i.e. I only want to perform the calculation when
Year in [3,5,7,10,13,15,17,20,...]
for the first 50 years (assuming the dataframe has that many rows; in my case, the last year is 40)
CodePudding user response:
I think you're looking for something like this:
# Required imports
import pandas as pd
import numpy as np
# Dummy data
df = pd.DataFrame(
{
'Year':[1, 2, 3],
'A':[2000, 4000, 6000],
'B':[200, 400, 600]
}
)
# List of yers you want to compute `0.06 * (df.A df.B)`
years = [3, 5, 7, 10]
# When Year value exists inside the `years` list defined above, perform the calculation
# Otherwise, set it to None.
# NOTE: chnage the third parameter (None) to some default value you want to
# use when the year value is not contained inside your list.
df['C'] = np.where(df['Year'].isin(years), 0.06 * (df.A df.B), None)
# ^---------------------^ ^------------------^ ^---^
# | | |
# --- Condition | --- What to set when condition is not met.
# ---- What to set when condition is met
df
# Returns:
# Year A B C
# 0 1 2000 200 None
# 1 2 4000 400 None
# 2 3 6000 600 396.0
CodePudding user response:
years = [3,5,7,10]
# filter rows matching the year, and calculate C
df.loc[df['Year'].isin(years), 'C']=0.06*(df.A df.B)
df
Year A B C
0 1 2000 200 NaN
1 2 4000 400 NaN
2 3 6000 600 396.0
CodePudding user response:
You can try:
years = np.ravel([np.array([3,5,7,10]) 10*i for i in range(4)])
df['C'] = np.where(df['Year'].isin(years), 0.06*(df.A df.B), np.nan)