Home > Enterprise >  How to get list of strings from list-like string that includes nan?
How to get list of strings from list-like string that includes nan?

Time:05-11

Here is toy-example, I've string like this:

import numpy as np
z = str([np.nan, "ab", "abc"])

Printed it looks like "[nan, 'ab', 'abc']" but I've to process z = str([np.nan, "ab", "abc"])

I want to get from z list of strings excluding nan:

zz = ["ab", "abc"]

To be clear: z is input (string, that look list-like), zz is wanted output (list)

There is no problem if z doesn't contain nan, in such ast.literal_eval(z) do the job, but with nan I get error about malformed node or string.

Note: np.nan doesn't have to be first.

CodePudding user response:

ast.literal_eval is suggested over eval exactly because it allows a very limited set of statements. As stated in the docs: "Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis." np.nan is none of those so it cannot be evaluated. There are few choices to handle this.

  • Remove nan by operating on the string before doing evaluation on it. Might be problematic if you want to avoid also removing nan from inside the actual strings.
  • NOT ADVISED - SECURITY RISKS - standard eval can handle this if you define nan variable in the namespace
  • And finally, I think the best choice but also hardest to implement: like explained here, you take the source code for ast, subclass it and reimplement literal_eval in such a way that it knows how to handle nan string on it's own.

CodePudding user response:

As I understand it, your goal is to parse csv or similar.

If you want a trade-off solution that should work in most cases, you can use a regex to get rid of the "nan". It will fail on the strings that contain the substring nan, (with comma), but this seems to be a reasonably unlikely edge case. Worth to explode with you real data.

z = str([np.nan, "ab", np.nan, "nan,", "abc", "x nan , y", "x nan y"])

import re
literal_eval(re.sub(r'\bnan\s*,\s*', '', z))

output: ['ab', '', 'abc', 'x y', 'x nan y']

CodePudding user response:

What about:

eval(z,{'nan':'nan'}) # if you can tolerate then: 
[i for i in eval(z,{'nan':'nan'}) if i != 'nan']

It may have security considerations.

CodePudding user response:

Many Solutions one of these is

z = [nan, 'string', 'another_one']
string_list = []

for item in z :
    # find the object come from str Class and Append it to the list
    if item.__class__ == str:
            string_list.append(item)

CodePudding user response:

Something like this :

import numpy as np 
z = [item for item in [np.nan, "ab", "abc" ] if type(item) == str]
print(z)

CodePudding user response:

Use filter() function:

list(filter(lambda f: type(f)==str, z))
  • Related