I want to create two binary indicators by checking to see if the characters in the first and third positions for column 'A' matches the characters found in the first and third positions of column 'B'.
Here is a sample data frame:
df = pd.DataFrame({'A' : ['a%d', 'a%', 'i%'],
'B' : ['and', 'as', 'if']})
A B
0 a%d and
1 a% as
2 i% if
I would like the data frame to look like below:
A B Match_1 Match_3
0 a%d and 1 1
1 a% as 1 0
2 i% if 1 0
I tried using the following string comparison, but it the column just returns '0' values for the match_1 column.
df['match_1'] = np.where(df['A'][0] == df['B'][0], 1, 0)
I am wondering if there is a function that is similar to the substr function found in SQL.
CodePudding user response:
You could use pandas str
method, that can work to slice the elements:
df['match_1'] = df['A'].str[0].eq(df['B'].str[0]).astype(int)
df['match_3'] = df['A'].str[2].eq(df['B'].str[2]).astype(int)
output:
A B match_1 match_3
0 a%d and 1 1
1 a% as 1 0
2 i% if 1 0
If you have many positions to test, you can use a loop:
for pos in (1, 3):
df['match_%d' % pos] = df['A'].str[pos-1].eq(df['B'].str[pos-1]).astype(int)