Home > Back-end >  Reorder groups of values in the column with a new maximum value
Reorder groups of values in the column with a new maximum value

Time:07-21

I have a Pandas dataframe with a column with groups of digits placed in ascending order:

column
1
2
3
4
5
6
0
1
2
3
4
1
2
3
4
5
0
0

I want to restructure those groups of sequences where the maximum > 4. The ascending order should remain the same but the maximum number must be 4. To do that I need to duplicate one or more middle digits. So I want the column to look like this:

column
1
2
2
2
3
4
0
1
2
3
4
1
2
2
3
4
0
0

Zeros should be left as they are.

I tried to, first, make the grouping variable:

k = 0
l = 1
i = 0

for k in range(1, len(df)):
    if df.loc[k, 'column'] != 0:
        if  df.loc.loc[k, 'column'] > df.loc.loc[k-1, 'column']:
            df.loc.loc[k, 'position'] = l
        else:
            l = l   1
            df.loc.loc[k, 'position'] = l
    else:
         df.loc.loc[k, 'position'] = 0
         l = l   1

l=pd.DataFrame(df.loc.groupby('position')['columns'].max()).reset_index()

And then I tried to come up with something like this but this does not work with different maximum values (not only 4):

z = 1
r = 0

for z in range (1, len(l)-1):
  if l.loc[z,'column'] > 4:
     for r in range(0, l.loc[z,'column'] - 3):
                df.loc[df['column']==2 r, 'column'] = 2

    df['column'] = np.where(df['column'] > 2, df['column'] - r, df['column'])

Please help!

CodePudding user response:

Use groupby to group by each group of monotonic increases, and then do some math do normalize back to the 1-4 range:

MAX_NUMBER = 4
>>> df['column'].groupby((df['column'].diff() < 0).cumsum())\
               .apply(lambda s: s//(max(s)/MAX_NUMBER))\
               .fillna(0)

What I did here was just to absolute divide all numbers by max(s)/4 and the final result was quite similar to yours. But feel free to use any custom function you want (e.g. you can set 0, 1 manually to the first positions, and 3, 4 to the last, and just fill 2 in between; etc).

   column
0     0.0
1     1.0
2     2.0
3     2.0
4     3.0
5     4.0
6     0.0
7     1.0
8     2.0
9     3.0
10    4.0
11    0.0
12    1.0
13    2.0
14    3.0
15    4.0
16    0.0
17    0.0
  • Related