Home > Software engineering >  How to remove nan and the repeated character from a string list?
How to remove nan and the repeated character from a string list?

Time:10-20

I have a string list with nan and some repeated characters, as follows:

classes = [nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan,
           nan, nan, ' L-glutamate ', nan,
           nan, nan, nan, ' L-lysine ', nan,
           nan, nan, nan, nan, nan, nan, ' dihydroxyacetone',
           nan, nan, nan, ' CoA ', ' CoA ', ' CoA ', ' CoA ',
           ' CoA ', nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
           ' hydrogen sulfide ', nan, nan, nan, nan, ' CoA ',
           ' CoA ', nan, nan, nan, ' formate ', nan, nan, nan]

I want to remove its nan and the repeated characters. The expected results like:

[' L-glutamate ', ' L-lysine ', ' dihydroxyacetone', ' CoA ', ' hydrogen sulfide ', ' formate ']

I tried :

nan = float('nan')
ll = [[i for i in j if i == i] for j in classes]

and:

nan = float('nan')
ll = [[x for x in y if type(x) != float or not math.isnan(x)] for y in classes] 

Both of them reported Error : TypeError: 'float' object is not iterable.

If it's possible, I also want to delete the (space symbol)at the start and end of every string and reserve the other (space symbol), as follows:

['L-glutamate', 'L-lysine', 'dihydroxyacetone', 'CoA', 'hydrogen sulfide', 'formate']

Could anyone tell me how to get this?

CodePudding user response:

You could try this instead of checking nan==nan False you can check nan in (nan, )True

from math import nan
classes = [nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan,
           nan, nan, ' L-glutamate ', nan,
           nan, nan, nan, ' L-lysine ', nan,
           nan, nan, nan, nan, nan, nan, ' dihydroxyacetone',
           nan, nan, nan, ' CoA ', ' CoA ', ' CoA ', ' CoA ',
           ' CoA ', nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
           nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
           ' hydrogen sulfide ', nan, nan, nan, nan, ' CoA ',
           ' CoA ', nan, nan, nan, ' formate ', nan, nan, nan]
print([i.strip() for i in classes if i not in (nan, )])

CodePudding user response:

Use dict.fromkeys to remove the duplicates from the list without changing the order. If the order is not important you can also use set. After that use list comprehensions to remove the leading/trailing spaces and filter the nan

d = dict.fromkeys(classes)
lst = [x.strip() for x in d if x is not nan]

or pop the nan before

d = dict.fromkeys(classes)
d.pop(nan)
lst = [x.strip() for x in d]

Output

['L-glutamate', 'L-lysine', 'dihydroxyacetone', 'CoA', 'hydrogen sulfide', 'formate']

CodePudding user response:

You can do this with filter map,

from math import nan
output = list(set(map(str.strip, filter(lambda x: x is not nan, classes))))
  • Related