I try to remove duplicates on rows but I need to have strings with length <= 2 and integer.
I have a sentence like this:
AIR OPTIX Air Optix plus HydraGlyde Lenti a Contatto Mensili, 6 Lenti, BC a 6 mm, DIA 14.2 mm, -0.75 Diopt
What I need to obtain is:
AIR OPTIX plus HydraGlyde Lenti a Contatto Mensili, 6 Lenti, BC a 6 mm, DIA 14.2 mm, -0.75 Diopt
What I manage to do with the below function is:
AIR OPTIX plus HydraGlyde Lenti Contatto Mensili, 6 Lenti, BC mm, DIA 14.2 -0.75 Diopt
Missing a
,6
,mm
.
I need to modify the function so that the duplicate removal will not consider strings with len <=2 or any kind of integer instead, these values should remain as they are.
def uniqueList(row):
words = str(row).split(" ")
unique = words[0]
for w in words:
if w.lower() not in unique.lower() :
unique = unique " " w
return unique
df["Correction_Value"] = df["Correction_Value"].apply(uniqueList)
CodePudding user response:
Here. Not the best code, but it get's the job done - it passes the test.
def uniqueList(row):
words = str(row).split(" ")
unique = words[0]
for w in words:
try:
int(w)
unique = unique " " w
continue
except ValueError:
pass
if 'mm' in w:
unique = unique " " w
continue
if (w.lower() not in unique.lower()) or (len(w) <= 2):
unique = unique " " w
return unique
df["Correction_Value"] = df["Correction_Value"].apply(uniqueList)
CodePudding user response:
I try to remove duplicates on rows but I need to have strings with length <= 2 and integer.
The code below does that (Note that mm
!= mm,
)
def rem_dups(value: str) -> str:
def _is_int(x):
try:
int(x)
return True
except ValueError:
return False
result = []
seen = set()
words = value.split()
for word in words:
if len(word) <= 2 or _is_int(word):
result.append(word)
else:
if word.lower() not in seen and word.upper() not in seen:
result.append(word)
seen.add(word)
return ' '.join(result)
_value = 'AIR OPTIX Air Optix plus HydraGlyde Lenti a Contatto Mensili, 6 Lenti, BC a 6 mm, DIA 14.2 mm, -0.75 Diopt'
print(rem_dups(_value))
output
AIR OPTIX plus HydraGlyde Lenti a Contatto Mensili, 6 Lenti, BC a 6 mm, DIA 14.2 -0.75 Diopt