We have a dataframe and need to compare two its columns. One has, e.g., "555, 333, 444", the other has "555A, 333, 444B". We need to get the difference, i.e. "A, NaN, B". Neither of variants found in the net worked in my try. These are columns, thus a series. Thus Split method doesm't apply. Replace method does work, dbut doesn't return the required difference. The Ndiff also return something very thoughful, but not the required difference.
Is there a general solution, for all kinds of data (numbers, texts)? THank you.
CodePudding user response:
You can implement a diff function with Longest Common Substring.
Here is my old function for getting differences between strings (TypeScript).
Press run and you will see that it correctly identifies that A and B were added :)
This is the "general" solution. If you could provide more context there could be a more efficient method.
CodePudding user response:
You can just use loops, check if a character is not present in the string and you can save the difference in a variable.
Here's a way to do it in python:
x = 'abcd'
y = 'cdefg'
s = ''
t = ''
for i in x: # checking x with y
if i not in y:
s = i
for i in y: # checking y with x
if i not in x:
t = i
print(s) # ab
print(t) # efg