I already started a similar topick, but few essential novelties are brought in. We have two columns: "333, 444, 555", and "333A, 444, 555B", and we need to get a column shewing "A, n/a, B", i.e. difference in values between the two.
one= ''
for h in str(column1):
if h not in str(column2):
one = h
whence we get a string of differences. But is there are way to delimit the outcome and eventually place it at corresponding rows? Making
Col1 | Col2 | Col3 |
---|---|---|
333 | 333A | A |
444 | 444 | n/a |
555 | 555B | B |
? Thank you
CodePudding user response:
IIUC, use difflib
:
from difflib import ndiff
diff = lambda x: ''.join(c[-1] for c in ndiff(x['Col1'], x['Col2']) if c[0] == ' ')
df['Col3'] = df.astype({'Col1': str, 'Col2': str}).apply(diff, axis=1)
print(df)
# Output
Col1 Col2 Col3
0 333 333A A
1 444 444
2 555 555B B
3 777 787C 8C
Using astype({'Col1': str, 'Col2': str})
is not mandatory if you already have strings.
Update
Try this version:
def diff(x):
s1 = str(x['Col1'])
s2 = str(x['Col2'])
l = [c[-1] for c in ndiff(s1, s2) if c[0] == ' ']
return ''.join(l)
df['Col3'] = df.apply(diff, axis=1)
Explanation:
Suppose the strings s1 = '567'
and s2 = '597C'
. The expected result is '9C'.
# Without a comprehension
for c in ndiff(s1, s2):
print(c)
# Output
5 # character in both strings
6 # character in s1 only
9 # character in s2 only
7 # character in both strings
C # character in s2 only
c[0]
is the first character (the sign ' ' or '-' or ' ')c[-1]
is the last character (the current letter)
So with the comprehension, we want to extract the current character c[-1]
only if the sign c[0]
is ' '
.