How to prepend a string that starts with a number with string from same row in dataframe?-CodePudding

I have the following dataframe (df):

col_1 col_2 col_3 col_4
sample_001 fjsah AB 11-110
sample_002 dfshb CD 20-210
sample_003 fsvhb EF N3-303
sample_004 dfbkk GH Q4-444
sample_005 gnddl IJ 55-005

I want to prepend the string in col_3 to the respective string in col_4 only if the string in col_4 starts with a number, such that the df is as follows:

col_1 col_2 col_3 col_4
sample_001 fjsah AB AB11-110
sample_002 dfshb CD CD20-210
sample_003 fsvhb EF N3-303
sample_004 dfbkk GH Q4-444
sample_005 gnddl IJ IJ55-005

I am able to identify which col_4 strings start with a number with:

for n in df['col_4']:
    if n[0].isdigit():
        print(n)

but I can't figure out how to make the "selective merge" happen in the for loop

CodePudding user response：

You can use Series.str[0].str.isdigit() to create a series of boolean indicating if the first character in each row is a digit or not, and you can use this masking along with .loc to modify the values:

df.loc[df['col_4'].str[0].str.isdigit(), 'col_4'] = df['col_3'] df['col_4']

# df
        col_1  col_2 col_3     col_4
0  sample_001  fjsah    AB  AB11-110
1  sample_002  dfshb    CD  CD20-210
2  sample_003  fsvhb    EF    N3-303
3  sample_004  dfbkk    GH    Q4-444
4  sample_005  gnddl    IJ  IJ55-005

CodePudding user response：

Another way - with apply and lambda -

df.loc[:, 'col_4'] = df.apply(lambda row: row['col_3']   row['col_4'] if row['col_4'][0].isdigit() else row['col_4'], axis=1)

Output

        col_1  col_2 col_3     col_4
0  sample_001  fjsah    AB  AB11-110
1  sample_002  dfshb    CD  CD20-210
2  sample_003  fsvhb    EF    N3-303
3  sample_004  dfbkk    GH    Q4-444
4  sample_005  gnddl    IJ  IJ55-005

CodePudding user response：

You can make a function encapsulating that logic and apply it by row.

def f(row):
    try:
        number = int(row.col_4[0])
        return f'{row.col_3}{row.col_4}'
    except ValueError:
        return row.col_4

df['new_col'] = df.apply(f, axis=1)

        col_1  col_2 col_3   col_4   new_col
0  sample_001  fjsah    AB  11-110  AB11-110
1  sample_002  dfshb    CD  20-210  CD20-210
2  sample_003  fsvhb    EF  N3-303    N3-303
3  sample_004  dfbkk    GH  Q4-444    Q4-444
4  sample_005  gnddl    IJ  55-005  IJ55-005