I have the following dataframe (df):
col_1 col_2 col_3 col_4
sample_001 fjsah AB 11-110
sample_002 dfshb CD 20-210
sample_003 fsvhb EF N3-303
sample_004 dfbkk GH Q4-444
sample_005 gnddl IJ 55-005
I want to prepend the string in col_3 to the respective string in col_4 only if the string in col_4 starts with a number, such that the df is as follows:
col_1 col_2 col_3 col_4
sample_001 fjsah AB AB11-110
sample_002 dfshb CD CD20-210
sample_003 fsvhb EF N3-303
sample_004 dfbkk GH Q4-444
sample_005 gnddl IJ IJ55-005
I am able to identify which col_4 strings start with a number with:
for n in df['col_4']:
if n[0].isdigit():
print(n)
but I can't figure out how to make the "selective merge" happen in the for loop
CodePudding user response:
You can use Series.str[0].str.isdigit()
to create a series of boolean indicating if the first character in each row is a digit or not, and you can use this masking along with .loc
to modify the values:
df.loc[df['col_4'].str[0].str.isdigit(), 'col_4'] = df['col_3'] df['col_4']
# df
col_1 col_2 col_3 col_4
0 sample_001 fjsah AB AB11-110
1 sample_002 dfshb CD CD20-210
2 sample_003 fsvhb EF N3-303
3 sample_004 dfbkk GH Q4-444
4 sample_005 gnddl IJ IJ55-005
CodePudding user response:
Another way - with apply
and lambda
-
df.loc[:, 'col_4'] = df.apply(lambda row: row['col_3'] row['col_4'] if row['col_4'][0].isdigit() else row['col_4'], axis=1)
Output
col_1 col_2 col_3 col_4
0 sample_001 fjsah AB AB11-110
1 sample_002 dfshb CD CD20-210
2 sample_003 fsvhb EF N3-303
3 sample_004 dfbkk GH Q4-444
4 sample_005 gnddl IJ IJ55-005
CodePudding user response:
You can make a function encapsulating that logic and apply it by row.
def f(row):
try:
number = int(row.col_4[0])
return f'{row.col_3}{row.col_4}'
except ValueError:
return row.col_4
df['new_col'] = df.apply(f, axis=1)
col_1 col_2 col_3 col_4 new_col
0 sample_001 fjsah AB 11-110 AB11-110
1 sample_002 dfshb CD 20-210 CD20-210
2 sample_003 fsvhb EF N3-303 N3-303
3 sample_004 dfbkk GH Q4-444 Q4-444
4 sample_005 gnddl IJ 55-005 IJ55-005