I have a Pandas dataframe with a column with groups of digits placed in ascending order:
column
1
2
3
4
5
6
0
1
2
3
4
1
2
3
4
5
0
0
I want to restructure those groups of sequences where the maximum > 4. The ascending order should remain the same but the maximum number must be 4. To do that I need to duplicate one or more middle digits. So I want the column to look like this:
column
1
2
2
2
3
4
0
1
2
3
4
1
2
2
3
4
0
0
Zeros should be left as they are.
I tried to, first, make the grouping variable:
k = 0
l = 1
i = 0
for k in range(1, len(df)):
if df.loc[k, 'column'] != 0:
if df.loc.loc[k, 'column'] > df.loc.loc[k-1, 'column']:
df.loc.loc[k, 'position'] = l
else:
l = l 1
df.loc.loc[k, 'position'] = l
else:
df.loc.loc[k, 'position'] = 0
l = l 1
l=pd.DataFrame(df.loc.groupby('position')['columns'].max()).reset_index()
And then I tried to come up with something like this but this does not work with different maximum values (not only 4):
z = 1
r = 0
for z in range (1, len(l)-1):
if l.loc[z,'column'] > 4:
for r in range(0, l.loc[z,'column'] - 3):
df.loc[df['column']==2 r, 'column'] = 2
df['column'] = np.where(df['column'] > 2, df['column'] - r, df['column'])
Please help!
CodePudding user response:
Use groupby
to group by each group of monotonic increases, and then do some math do normalize back to the 1-4 range:
MAX_NUMBER = 4
>>> df['column'].groupby((df['column'].diff() < 0).cumsum())\
.apply(lambda s: s//(max(s)/MAX_NUMBER))\
.fillna(0)
What I did here was just to absolute divide all numbers by max(s)/4
and the final result was quite similar to yours. But feel free to use any custom function you want (e.g. you can set 0, 1 manually to the first positions, and 3, 4 to the last, and just fill 2
in between; etc).
column
0 0.0
1 1.0
2 2.0
3 2.0
4 3.0
5 4.0
6 0.0
7 1.0
8 2.0
9 3.0
10 4.0
11 0.0
12 1.0
13 2.0
14 3.0
15 4.0
16 0.0
17 0.0