Home > database >  Averages of DataFrame columns in Python
Averages of DataFrame columns in Python

Time:11-12

I am unable to comment on the original question as I don't have a high enough reputation, but I refer to this question DataFrames - Average Columns, specifically this line of code:

dfgrp= df.iloc[:,2:].groupby((np.arange(len(df.iloc[:,2:].columns)) // 2) 1, axis=1).mean().add_prefix('ColumnAVg')

As I read it, take all rows from column 2 onwards, group by the length of the same rows and columns something something something on columns, not rows, get the mean of those columns then add to new columns called ColumnAVg1/2/3 etc.

I also know this takes the mean of columns 1&2, 3&4, 5&6 etc. but I don't know how it does.

And so my question is, what needs to change in the above code to get the mean of columns 1&2, 2&3, 3&4, 4&5 etc. with the results in the same format?

CodePudding user response:

So unfortunately you cannot alter that code to get your result, because it achieved what it does by assigning a number to each column, and thus grouping them together. However, you can do something cheeky. Just provide 2 groupings, get the average for each grouping and combined them into a single frame.

df = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])

d1 = df.groupby((np.arange(len(df.columns)) // 2), axis=1).mean()
d2 = df.groupby((np.arange(len(df.columns)   1) // 2)[1:], axis=1).mean()

dfo = pd.DataFrame()
for i in range(len(df.columns)-1):
    c = f'average_{df.columns[i]}_{df.columns[i 1]}'
    if i % 2 == 0:
        dfo[c] = d1[d1.columns[i / 2]]
    else:
        dfo[c] = d2[d2.columns[(i 1) / 2]]

What he did is to assign columns 1,2,3,4 to 1,1,2,2. So in our code, we have d1 assigned according to 1,1,2,2 and d2 assigned according to 0,1,1,2. The for loop is to combine the results.

CodePudding user response:

df     = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])
groups = [(1,2),(2,3),(2,3,4),(1,3)]
df2    = pd.DataFrame([df.iloc[:, i - 1] for z in groups for i in z]).T
labels = [str(z) for z in groups for _ in z]
result = df2.groupby(by=labels, axis=1).mean()

Probably not what you were looking for but something like this should work.

  • Related