I have a dataframe that looks like this:
Step | Text | Parameter |
---|---|---|
15 | 1 | |
16 | control | 2 |
17 | printout | 3 |
18 | print2 | 1 |
19 | Nan | 2 |
20 | Nan | 3 |
21 | Nan | 4 |
22 | Nan | 1 |
23 | Nan | 2 |
24 | Nan | 1 |
And I want my dataframe to look like this:
Step | Text | Parameter |
---|---|---|
15 | 1 | |
15 | 2 | |
15 | 3 | |
16 | control | 1 |
16 | control | 2 |
17 | control | 3 |
17 | control | 4 |
18 | printout | 1 |
18 | printout | 2 |
19 | print2 | 1 |
So basically when I have "1" in Parameter column, I need the next value from Step and Text. Any ideas?:)
CodePudding user response:
You can use repeat
on a custom group:
# ensure NaN
df['Text'] = df['Text'].replace('Nan', pd.NA)
# get the number of rows per group starting with 1
n = df.groupby(df['Parameter'].eq(1).cumsum()).size()
# repeat the index of the non NaN values as many times
idx = df['Text'].dropna().index.repeat(n)
# replace the values ignoring the index
# (using the underlying numpy array)
df[['Step', 'Text']] = df.loc[idx, ['Step', 'Text']].to_numpy()
output:
Step Text Parameter
0 15 print 1
1 15 print 2
2 15 print 3
3 16 control 1
4 16 control 2
5 16 control 3
6 16 control 4
7 17 printout 1
8 17 printout 2
9 18 print2 1