I have a few dataframes, all with the same format:
i |nameA | A | nameB | B | nameC | C | nameD | D
-------------------------------------------------------
0 | 1 | 45 | 9 | 123 | 123 | 23 | 1 | 502
1 | 5 | 435 | 15 | 894 | 523 | 1.1 | 5 | 502
2 | 10 | 112 | 20 | 542 | 234 | 52 | 7 | 502
3 | 7 | 432 | 2 | 900 | 43 | 653 | 4 | 502
4 | 8 | 854 | 6 | 234 | 853 | 50 | 3 | 502
5 | 9 | 231 | 80 | 435 | 95 | 56 | 9 | 502
I want to find the maximum & minimum n values across A, B, C, D, along with their indices so that I can find the 'names' which are 0~3digit integers.
from the above example dataframe, should I need max 4 values, I would like
(column, index, value, name):
(B, 3, 900, 2)
(B, 1, 894, 15)
(A, 4, 854, 8)
(C, 3, 653, 43)
and the min 4 should be:
(column, index, value, name):
(C, 1, 1.1, 523)
(C, 0, 23, 123)
(A, 0, 45, 1)
(C, 4, 50, 853)
not necessarily in this format.
the dataframe is not really sorted in any way, and names do not have any correlation with the values.
any help is much appreciated, thank you.
Edit: filled in the example dataframe and results.
CodePudding user response:
You can use idxmax
and idxmin
to get the index (col i
) of the max and the min for each column.
df = pd.DataFrame({'i':[0,1,2],'nameA':[1,5,10],'A':[45,435,112]})
df.set_index('i',inplace = True)
df.A.idxmax()
1
CodePudding user response:
As there is no provided example that is complete, here is one:
np.random.seed(0)
df = (pd.DataFrame(np.random.randint(0,100, size=(100, 8)),
columns=['nameA', 'A', 'nameB', 'B', 'nameC', 'C', 'nameD', 'D'])
.rename_axis('i')
.reset_index()
)
>>> df.head()
i nameA A nameB B nameC C nameD D
0 0 44 47 64 67 67 9 83 21
1 1 36 87 70 88 88 12 58 65
2 2 39 87 46 88 81 37 25 77
3 3 72 9 20 80 69 79 47 64
4 4 82 99 88 49 29 19 19 14
Now you can reshape and loc
the 4 nlargest
values per group:
(df.set_index('i')
.set_axis(pd.MultiIndex.from_frame(df.columns[1:].str.extract('(^name)?(. )').fillna('value'),
names=[None, 'columns']
), axis=1)
.stack(level='columns')
.loc[lambda d: d.groupby('columns', group_keys=False)['value'].nlargest(4).index]
.reset_index()
)
output:
i columns name value
0 4 A 82 99
1 96 A 12 97
2 69 A 87 96
3 13 A 64 95
4 49 B 51 99
5 14 B 3 98
6 56 B 20 97
7 89 B 18 97
8 9 C 14 99
9 11 C 20 99
10 61 C 98 97
11 78 C 12 96
12 96 D 26 99
13 15 D 2 98
14 68 D 74 98
15 73 D 79 98