Hey so I have this problem of splitting strings in a column based on another column's values, I managed to figure out a solution with df.apply but I wonder if there are any str.split related ways to vectorize this implementation?
name field
0 b_b b
1 b_c b
2 b_d b
3 a_paris a
4 a_tokyo_ghoul a
5 a_xx a
I would like to convert the 'name' column into
0 b
1 c
2 d
3 paris
4 tokyo_ghoul
5 xx
and my current implementation is
df.apply(lambda row: row['name'].split(f"{row['field']}_")[-1], axis=1)
CodePudding user response:
Assuming you want to extract the field after the first _
and validate that the initial string is the same as df['field']
:
df2 = df['name'].str.split('_', n=1, expand=True)
df['name2'] = df2[1].where(df2[0].eq(df['field']))
output:
name field name2
0 b_b b b
1 b_c b c
2 b_d b d
3 a_paris a paris
4 a_tokyo_ghoul a tokyo_ghoul
5 a_xx a xx
CodePudding user response:
If all the name contains '' in your name column, then you can split the column on basis of '', this way you got prefix and postfix of data of the name column. After call the postfix data that come-after '_', this way you get desired output.
Edit*:- You can also use other columns as a base that contains the prefix data of name field and put filtter that for field str match name column then split rest part