How to get list of strings from list-like string that includes nan?-CodePudding

Here is toy-example, I've string like this:

import numpy as np
z = str([np.nan, "ab", "abc"])

Printed it looks like "[nan, 'ab', 'abc']" but I've to process z = str([np.nan, "ab", "abc"])

I want to get from z list of strings excluding nan:

zz = ["ab", "abc"]

To be clear: z is input (string, that look list-like), zz is wanted output (list)

There is no problem if z doesn't contain nan, in such ast.literal_eval(z) do the job, but with nan I get error about malformed node or string.

Note: np.nan doesn't have to be first.

CodePudding user response：

ast.literal_eval is suggested over eval exactly because it allows a very limited set of statements. As stated in the docs: "Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis." np.nan is none of those so it cannot be evaluated. There are few choices to handle this.

Remove nan by operating on the string before doing evaluation on it. Might be problematic if you want to avoid also removing nan from inside the actual strings.
NOT ADVISED - SECURITY RISKS - standard eval can handle this if you define nan variable in the namespace
And finally, I think the best choice but also hardest to implement: like explained here, you take the source code for ast, subclass it and reimplement literal_eval in such a way that it knows how to handle nan string on it's own.

CodePudding user response：

As I understand it, your goal is to parse csv or similar.

If you want a trade-off solution that should work in most cases, you can use a regex to get rid of the "nan". It will fail on the strings that contain the substring nan, (with comma), but this seems to be a reasonably unlikely edge case. Worth to explode with you real data.

z = str([np.nan, "ab", np.nan, "nan,", "abc", "x nan , y", "x nan y"])

import re
literal_eval(re.sub(r'\bnan\s*,\s*', '', z))

output: ['ab', '', 'abc', 'x y', 'x nan y']

CodePudding user response：

What about:

eval(z,{'nan':'nan'}) # if you can tolerate then: 
[i for i in eval(z,{'nan':'nan'}) if i != 'nan']

It may have security considerations.

CodePudding user response：

Many Solutions one of these is

z = [nan, 'string', 'another_one']
string_list = []

for item in z :
    # find the object come from str Class and Append it to the list
    if item.__class__ == str:
            string_list.append(item)

CodePudding user response：

Something like this :

import numpy as np 
z = [item for item in [np.nan, "ab", "abc" ] if type(item) == str]
print(z)

CodePudding user response：

Use filter() function:

list(filter(lambda f: type(f)==str, z))