I have a string list with nan and some repeated characters, as follows:
classes = [nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, ' L-glutamate ', nan,
nan, nan, nan, ' L-lysine ', nan,
nan, nan, nan, nan, nan, nan, ' dihydroxyacetone',
nan, nan, nan, ' CoA ', ' CoA ', ' CoA ', ' CoA ',
' CoA ', nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
' hydrogen sulfide ', nan, nan, nan, nan, ' CoA ',
' CoA ', nan, nan, nan, ' formate ', nan, nan, nan]
I want to remove its nan and the repeated characters. The expected results like:
[' L-glutamate ', ' L-lysine ', ' dihydroxyacetone', ' CoA ', ' hydrogen sulfide ', ' formate ']
I tried :
nan = float('nan')
ll = [[i for i in j if i == i] for j in classes]
and:
nan = float('nan')
ll = [[x for x in y if type(x) != float or not math.isnan(x)] for y in classes]
Both of them reported Error :
TypeError: 'float' object is not iterable
.
If it's possible, I also want to delete the
(space symbol)at the start and end of every string and reserve the other
(space symbol), as follows:
['L-glutamate', 'L-lysine', 'dihydroxyacetone', 'CoA', 'hydrogen sulfide', 'formate']
Could anyone tell me how to get this?
CodePudding user response:
You could try this
instead of checking nan==nan
False you can check nan in (nan, )
True
from math import nan
classes = [nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan,
nan, nan, ' L-glutamate ', nan,
nan, nan, nan, ' L-lysine ', nan,
nan, nan, nan, nan, nan, nan, ' dihydroxyacetone',
nan, nan, nan, ' CoA ', ' CoA ', ' CoA ', ' CoA ',
' CoA ', nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
' hydrogen sulfide ', nan, nan, nan, nan, ' CoA ',
' CoA ', nan, nan, nan, ' formate ', nan, nan, nan]
print([i.strip() for i in classes if i not in (nan, )])
CodePudding user response:
Use dict.fromkeys
to remove the duplicates from the list without changing the order. If the order is not important you can also use set
. After that use list comprehensions to remove the leading/trailing spaces and filter the nan
d = dict.fromkeys(classes)
lst = [x.strip() for x in d if x is not nan]
or pop
the nan
before
d = dict.fromkeys(classes)
d.pop(nan)
lst = [x.strip() for x in d]
Output
['L-glutamate', 'L-lysine', 'dihydroxyacetone', 'CoA', 'hydrogen sulfide', 'formate']
CodePudding user response:
You can do this with filter
map
,
from math import nan
output = list(set(map(str.strip, filter(lambda x: x is not nan, classes))))