Home > database >  Making a column with differences between 2 other column
Making a column with differences between 2 other column

Time:03-14

I already started a similar topick, but few essential novelties are brought in. We have two columns: "333, 444, 555", and "333A, 444, 555B", and we need to get a column shewing "A, n/a, B", i.e. difference in values between the two.

one= ''
for h in str(column1):
    if h not in str(column2):
        one  = h

whence we get a string of differences. But is there are way to delimit the outcome and eventually place it at corresponding rows? Making

Col1 Col2 Col3
333 333A A
444 444 n/a
555 555B B

? Thank you

CodePudding user response:

IIUC, use difflib:

from difflib import ndiff

diff = lambda x: ''.join(c[-1] for c in ndiff(x['Col1'], x['Col2']) if c[0] == ' ')
df['Col3'] = df.astype({'Col1': str, 'Col2': str}).apply(diff, axis=1)
print(df)

# Output
   Col1  Col2 Col3
0   333  333A    A
1   444   444     
2   555  555B    B
3   777  787C   8C

Using astype({'Col1': str, 'Col2': str}) is not mandatory if you already have strings.

Update

Try this version:

def diff(x):
    s1 = str(x['Col1'])
    s2 = str(x['Col2'])
    l = [c[-1] for c in ndiff(s1, s2) if c[0] == ' ']
    return ''.join(l)

df['Col3'] = df.apply(diff, axis=1)

Explanation:

Suppose the strings s1 = '567' and s2 = '597C'. The expected result is '9C'.

# Without a comprehension
for c in ndiff(s1, s2):
    print(c)

# Output
  5  # character in both strings
  6  # character in s1 only
  9  # character in s2 only
  7  # character in both strings
  C  # character in s2 only
  • c[0] is the first character (the sign ' ' or '-' or ' ')
  • c[-1] is the last character (the current letter)

So with the comprehension, we want to extract the current character c[-1] only if the sign c[0] is ' '.

  • Related