Modify dataframe with a given condition-CodePudding

col1	col2	col3
A1	data 1
Val B	data 2	data 6
Val B	data 3	data
A2	data 4	data
Val B	data 5	data 7

In the first column(col1), if ValB is found, just below the a cell, that starts with 'A', replace the only the ValB cell with the above cell element (that starts with A) retaining other values in the row of ValB. And ignore other 'Val B' rows if they are not below a cell that starts with A.

col1	col2	col3
A1	data 2	data 6
A2	data 5	data 7

Result

I want the result like this. Using python

CodePudding user response：

If need one row after match condition by Series.str.startswith with replace col1 by original DataFrame use:

df = df.shift(-1)[df['col1'].str.startswith('A')].assign(col1 = df['col1'])
print (df)
  col1    col2    col3
0   A1  data 2  data 6
3   A2  data 5  data 7

Another idea is shifting only col1 and then filter by condition in boolean indexing:

df['col1'] = df['col1'].shift()
df = df[df['col1'].str.startswith('A', na=False)]
print (df)
  col1    col2    col3
1   A1  data 2  data 6
4   A2  data 5  data 7

CodePudding user response：

Example

data = [['A1', 'data 1', None], 
        ['Val B', 'data 2', 'data 6'], 
        ['Val B', 'data 3', 'data'], 
        ['A2', 'data 4', 'data'], 
        ['Val B', 'data 5', 'data 7'], 
        ['A3', 'data 6', 'data 8'], 
        ['A4', 'data 9', 'data 9']]
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3'])

df

    col1    col2    col3
0   A1      data 1  None
1   Val B   data 2  data 6
2   Val B   data 3  data
3   A2      data 4  data
4   Val B   data 5  data 7
5   A3      data 6  data 8
6   A4      data 9  data 9

Code

s = df['col1'].mask(df['col1'].eq('Val B')).ffill()
df.assign(col1=s).groupby('col1').head(2).groupby('col1').tail(1)

output:

  col1  col2    col3
1   A1  data 2  data 6
4   A2  data 5  data 7
5   A3  data 6  data 8
6   A4  data 9  data 9

I think there may be cases where 'Val B' does not exist under A. so i make example and code.