I have following dataframe from pd.read_csv:
| Name | DOB | Country | Assigned_ID |
| John Doe | [1929/01/17,1910/03/25] | [ML,IND,BY] | [597212972,12345] |
I am trying to evaluate the columns DOB, Country, Assigned_ID as list in the columns so that I can explode
them later like this:
df_data = df_data.explode(["DOB"]).reset_index(drop=True)
So, I tried to convert using literal_eval
:
df_data['DOB'] = df_data['DOB'].apply(literal_eval)
which gave me below error.
[1929/01/17,1910/03/25]
^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
then I tried to convert the values to str like this:
df_data['DOB'] = df_data['DOB'].apply(lambda x: literal_eval(str(x)))
which failed again with the same error. What am I missing ? Can someone guide me in this?
Expected Output:
| Name | DOB | Country | Assigned_ID |
| John Doe | ['1929/01/17','1910/03/25'] | ['ML','IND','BY'] | ['597212972','12345'] |
CodePudding user response:
So, I think the problem is that your arrays contain strings, but literal_eval
cannot recognize it as the quotes are missing. For example, [1929/01/17,1910/03/25]
should be ['1929/01/17','1910/03/25']
for it to work. You can either use substitution to add the missing quotes and then literal_eval
. Another approach is simply to convert the string manually into a list of strings. Like this:
df['DOB'] = df['DOB'].apply(lambda x: x[1:-1].split(','))
df['Country'] = df['Country'].apply(lambda x: x[1:-1].split(','))
df['Assigned_ID'].apply(literal_eval)
print(df['DOB'][0][1])
Output:
1910/03/25
What we are doing is removing the brackets from the string and splitting the remaining string at the comma. The result is a string array of your values. For the column Assigned_ID
you can use literal_eval
as the array of numbers needs no quotes to be evaluated.