Home > Net >  Populating every nth row in a pandas dataframe
Populating every nth row in a pandas dataframe

Time:11-09

I have a pandas dataframe, df defined as follows:

df = pd.DataFrame({'Year':[1,2,3,...],'A':[2000,4000,6000,...],'B':[200,400,600,...]})

where 'Year' goes from 1-40 but it can be any integer n. I want to calculate a new column as follows

df['C'] = 0.06*(df.A   df.B]

However I wish to calculate column C only for years

years = [3,5,7,10]

i.e. I only want to perform the calculation when

Year in [3,5,7,10,13,15,17,20,...]

for the first 50 years (assuming the dataframe has that many rows; in my case, the last year is 40)

CodePudding user response:

I think you're looking for something like this:


# Required imports
import pandas as pd
import numpy as np

# Dummy data
df = pd.DataFrame(
    {
        'Year':[1, 2, 3],
        'A':[2000, 4000, 6000],
        'B':[200, 400, 600]
    }
)

# List of yers you want to compute `0.06 * (df.A   df.B)`
years = [3, 5, 7, 10]


# When Year value exists inside the `years` list defined above, perform the calculation
# Otherwise, set it to None.
# NOTE: chnage the third parameter (None) to some default value you want to
#       use when the year value is not contained inside your list.
df['C'] = np.where(df['Year'].isin(years), 0.06 * (df.A   df.B), None)
#                  ^---------------------^ ^------------------^ ^---^
#                  |                       |                    |
#                   --- Condition          |                     --- What to set when condition is not met.
#                                           ---- What to set when condition is met
df
# Returns:
#   Year     A    B      C
# 0     1  2000  200   None
# 1     2  4000  400   None
# 2     3  6000  600  396.0


CodePudding user response:

years = [3,5,7,10]
# filter rows matching the year, and calculate C

df.loc[df['Year'].isin(years), 'C']=0.06*(df.A   df.B)
df
    Year    A   B   C
0   1   2000    200     NaN
1   2   4000    400     NaN
2   3   6000    600     396.0

CodePudding user response:

You can try:

years = np.ravel([np.array([3,5,7,10])   10*i for i in range(4)])
df['C'] = np.where(df['Year'].isin(years), 0.06*(df.A   df.B), np.nan)
  • Related