Home > front end >  Changing values in columns based on their previous marker
Changing values in columns based on their previous marker

Time:05-13

I have the following dataframe:

df = {'id': [1,2,3,4],
    '1': ['Green', 'Green', 'Green', 'Green'],
  '2': ['34','67', 'Blue', '77'],
  '3': ['Blue', '45', '99', 'Blue'],
  '4': ['12', None, '101', '56'],
     '5': [None, None, None, '23']}

df = pd.DataFrame(df)


    id  1       2     3     4       5
0   1   Green   34    Blue  12      None
1   2   Green   67    45    None    None
2   3   Green   Blue  99    101     None
3   4   Green   77    Blue  56      23

I would like to add the prefix G_ to all values which are in columns after GREEN but not after BLUE. The values after BLUE should result in a B_ prefix.

Here is the output needed:

    id  1       2       3       4       5
0   1   Green   G_34    Blue    B_12    None
1   2   Green   G_67    G_45    None    None
2   3   Green   Blue    B_99    B_101   None
3   4   Green   G_77    Blue    B_56    B_23

CodePudding user response:

You can use:

m1 = df.isin(['Green', 'Blue'])
m2 = df.isnull()

(df[m1]
 .replace(r'(.).*', r'\1_', regex=True) # replace Blue/Green with B_/G_
 .ffill(axis=1)                         # fill to the right
 .add(df.astype(str))                   # combine with original data as string
 .mask(m1|m2)                           # mask Green/Blue and None
 .combine_first(df)                     # fill it with original data
)

NB. if really you don't have numbers in the original data you don't need the m2 mask: remove the m2 definition, and use .add(df) and .mask(m1).

output:

  id      1     2     3      4     5
0  1  Green  G_34  Blue   B_12  None
1  2  Green  G_67  G_45   None  None
2  3  Green  Blue  B_99  B_101  None
3  4  Green  G_77  Blue   B_56  B_23

CodePudding user response:

You can try apply on rows and forward fill the non Green/Blue part with last found word prefix.

def add_prefix(row):
    row_ = row.str.extract('(Green|Blue)').ffill()[0]
    row = row.mask(row.ne(row_), row_.str[0] '_' row)
    return row

out = df.filter(regex='\d').apply(add_prefix, axis=1)
out.insert(0, 'id', df['id'])
print(out)

   id      1     2     3      4     5
0   1  Green  G_34  Blue   B_12   NaN
1   2  Green  G_67  G_45    NaN   NaN
2   3  Green  Blue  B_99  B_101   NaN
3   4  Green  G_77  Blue   B_56  B_23
  • Related