Evaluate Numeric list in pandas column as string list-CodePudding

I have following dataframe from pd.read_csv:

| Name     | DOB                     | Country      | Assigned_ID |
| John Doe | [1929/01/17,1910/03/25] | [ML,IND,BY]  | [597212972,12345] |

I am trying to evaluate the columns DOB, Country, Assigned_ID as list in the columns so that I can explode them later like this:

df_data = df_data.explode(["DOB"]).reset_index(drop=True)

So, I tried to convert using literal_eval:

df_data['DOB'] = df_data['DOB'].apply(literal_eval)

which gave me below error.

[1929/01/17,1910/03/25]
      ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

then I tried to convert the values to str like this:

df_data['DOB'] = df_data['DOB'].apply(lambda x: literal_eval(str(x)))

which failed again with the same error. What am I missing ? Can someone guide me in this?

Expected Output:

| Name     | DOB                         | Country            | Assigned_ID |
| John Doe | ['1929/01/17','1910/03/25'] | ['ML','IND','BY']  | ['597212972','12345'] |

CodePudding user response：

So, I think the problem is that your arrays contain strings, but literal_eval cannot recognize it as the quotes are missing. For example, [1929/01/17,1910/03/25] should be ['1929/01/17','1910/03/25'] for it to work. You can either use substitution to add the missing quotes and then literal_eval. Another approach is simply to convert the string manually into a list of strings. Like this:

df['DOB'] = df['DOB'].apply(lambda x: x[1:-1].split(','))
df['Country'] = df['Country'].apply(lambda x: x[1:-1].split(','))
df['Assigned_ID'].apply(literal_eval)

print(df['DOB'][0][1])

Output:

1910/03/25

What we are doing is removing the brackets from the string and splitting the remaining string at the comma. The result is a string array of your values. For the column Assigned_ID you can use literal_eval as the array of numbers needs no quotes to be evaluated.