I have the following dataframe:
df = {'id': [1,2,3,4],
'1': ['Green', 'Green', 'Green', 'Green'],
'2': ['34','67', 'Blue', '77'],
'3': ['Blue', '45', '99', 'Blue'],
'4': ['12', None, '101', '56'],
'5': [None, None, None, '23']}
df = pd.DataFrame(df)
id 1 2 3 4 5
0 1 Green 34 Blue 12 None
1 2 Green 67 45 None None
2 3 Green Blue 99 101 None
3 4 Green 77 Blue 56 23
I would like to add the prefix G_ to all values which are in columns after GREEN but not after BLUE. The values after BLUE should result in a B_ prefix.
Here is the output needed:
id 1 2 3 4 5
0 1 Green G_34 Blue B_12 None
1 2 Green G_67 G_45 None None
2 3 Green Blue B_99 B_101 None
3 4 Green G_77 Blue B_56 B_23
CodePudding user response:
You can use:
m1 = df.isin(['Green', 'Blue'])
m2 = df.isnull()
(df[m1]
.replace(r'(.).*', r'\1_', regex=True) # replace Blue/Green with B_/G_
.ffill(axis=1) # fill to the right
.add(df.astype(str)) # combine with original data as string
.mask(m1|m2) # mask Green/Blue and None
.combine_first(df) # fill it with original data
)
NB. if really you don't have numbers in the original data you don't need the m2
mask: remove the m2
definition, and use .add(df)
and .mask(m1)
.
output:
id 1 2 3 4 5
0 1 Green G_34 Blue B_12 None
1 2 Green G_67 G_45 None None
2 3 Green Blue B_99 B_101 None
3 4 Green G_77 Blue B_56 B_23
CodePudding user response:
You can try apply
on rows and forward fill the non Green/Blue part with last found word prefix.
def add_prefix(row):
row_ = row.str.extract('(Green|Blue)').ffill()[0]
row = row.mask(row.ne(row_), row_.str[0] '_' row)
return row
out = df.filter(regex='\d').apply(add_prefix, axis=1)
out.insert(0, 'id', df['id'])
print(out)
id 1 2 3 4 5
0 1 Green G_34 Blue B_12 NaN
1 2 Green G_67 G_45 NaN NaN
2 3 Green Blue B_99 B_101 NaN
3 4 Green G_77 Blue B_56 B_23