The goal is to put the digits from the last row of the previous letter group in the new column "last_digit_prev_group". The expected, correct value, as a result formula, was entered by me manually in the column "col_ok". I stopped trying shift (), but the effect was far from what I expected. Maybe there is some other way? Forgive me the inconsistency of my post, I'm not an IT specialist and I don't know English. Thanks in advance for your support.
df = pd.read_csv('C:/Users/.../a.csv',names=['group_letter', 'digit', 'col_ok'] , index_col=0,)
df['last_digit_prev_group'] = df.groupby('group_letter')['digit'].shift(1)
print(df)
group_letter digit col_ok last_digit_prev_group
A 1 n NaN
A 3 n 1.0
A 2 n 3.0
A 5 n 2.0
A 1 n 5.0
B 1 1 NaN
B 2 1 1.0
B 1 1 2.0
B 1 1 1.0
B 3 1 1.0
C 5 3 NaN
C 6 3 5.0
C 1 3 6.0
C 2 3 1.0
C 3 3 2.0
D 4 3 NaN
D 3 3 4.0
D 2 3 3.0
D 5 3 2.0
D 7 3 5.0
CodePudding user response:
Use Series.mask
with DataFrame.duplicated
for last valeus of digit
, then Series.shift
and last ffill
:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.ffill())
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1 n NaN
1 A 3 n NaN
2 A 2 n NaN
3 A 5 n NaN
4 A 1 n NaN
5 B 1 1 1.0
6 B 2 1 1.0
7 B 1 1 1.0
8 B 1 1 1.0
9 B 3 1 1.0
10 C 5 3 3.0
11 C 6 3 3.0
12 C 1 3 3.0
13 C 2 3 3.0
14 C 3 3 3.0
15 D 4 3 3.0
16 D 3 3 3.0
17 D 2 3 3.0
18 D 5 3 3.0
19 D 7 3 3.0
If possible some last value is NaN
:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.groupby(df['group_letter']).ffill()
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1.0 n NaN
1 A 3.0 n NaN
2 A 2.0 n NaN
3 A 5.0 n NaN
4 A 1.0 n NaN
5 B 1.0 1 1.0
6 B 2.0 1 1.0
7 B 1.0 1 1.0
8 B 1.0 1 1.0
9 B 3.0 1 1.0
10 C 5.0 3 3.0
11 C 6.0 3 3.0
12 C 1.0 3 3.0
13 C 2.0 3 3.0
14 C NaN 3 3.0
15 D 4.0 3 NaN
16 D 3.0 3 NaN
17 D 2.0 3 NaN
18 D 5.0 3 NaN
19 D 7.0 3 NaN