Home > Software design >  Skipping over a number/range of values in a single row in a dataframe
Skipping over a number/range of values in a single row in a dataframe

Time:12-22

I have a long row that contains all numbers. The data are all associated with a week so Week 1 has a data point, Week 2 has a point and so on. I need to take the average of the data every 3 weeks, skip 2 weeks and then continue taking the average. This is a short sample of my original list:

import pandas as pd
import numpy as np

data = {'Name': ['Week 1',  'Week 2',   'Week 3',   'Week 4',   'Week 5',   
                 'Week 6',  'Week 7',   'Week 8',   'Week 9',   'Week 10',  'Week 11',  'Week 12'],
        'Value': [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,2]
            }

df = pd.DataFrame(data)

avg_goodweeks = np.mean(np.concatenate((df.values[1:3], df.values[4:6], df.values[7:9],
                                        df.values[10:12])))

avg_badweeks = np.mean(np.concatenate((df.values[3], df.values[6], df.values[9],
                                        df.values[12])))

So the average of this would ideally be the values of week 1, 2, 3, 6, 7, 8, 11, 12, but I would like to be able to set up either a range or block of 2 weeks or something of that order to read through the row and repeat the 3 weeks on 2 weeks off pattern. I will take the average of all the skipped weeks as well, but I figured that would be the same code. I realize the .values[1:3] is not correct, that is old code from when I hard coded data from an excel sheet. The point of this code is to remove hard coded data from excel, have all data on python and automatically sort through using a user defined pattern/range to average values while skipping others. Any help or ideas will be very useful.

CodePudding user response:

IIUC, you can use iloc here and get the mean of Value column:

bad = df.iloc[3::3]['Value']
print(bad.mean())
1.6666666666666667

good = df.drop(bad.index)['Value']
print(good.mean())
1.4444444444444444

CodePudding user response:

avg_goodweeks = np.array([df.values[::3, 1],
                          df.values[1::3, 1]]).mean(axis=0)
avg_badweeks = df.values[2::3, 1].mean()
  • Related