Home > Mobile >  How to convert string representation list with mixed values to a list?
How to convert string representation list with mixed values to a list?

Time:09-27

How do can I convert a string that contains values that are both strings and numeric, given that the string within the list is not in quotes?

import pandas as pd

df = pd.DataFrame({'col_1': ['[2, A]', '[5, BC]']})

print(df)

     col_1
0   [2, A]
1  [5, BC]

col_1    [2, A]
Name: 0, dtype: object

My aim is to use the list in another function, so I tried to transform the string with built-in functions such as eval() or ast.literal_eval(), however in both cases I need to add quotes around the strings, so it is "A" and "BC".

CodePudding user response:

You can first use a regex to add quotes around the potential strings (here I used letters underscore), then use literal_eval (for some reason I have an error with pd.eval)

from ast import literal_eval
df['col_1'].str.replace(r'([a-zA-Z_] )', r'"\1"', regex=True).apply(literal_eval)

output (lists):

0     [2, A]
1    [5, BC]

CodePudding user response:

It is already a string and If the data is going to be in a certain format-

df['col_2'] = df['col_1'].apply(lambda x: x.split(',')[1].rstrip(']'))

CodePudding user response:

If you want the output to be a list:

import pandas as pd

df = pd.DataFrame({'col_1': ['[2, A]', '[5, BC]']})
print(df)

a = df["col_1"].to_list()
actual_list = [[int(i.split(",")[0][1:]), str(i.split(",")[1][1:-1])] for i in a]
actual_list

Output:

[[2, 'A'], [5, 'BC']]

CodePudding user response:

If you just need to convert string representation list to list of strings, you can use str.strip() together with str.split(), as follows:

df['col_1'].str.strip('[]').str.split(',\s*')

Result:

print(df['col_1'].str.strip('[]').str.split(',\s*').to_dict())

{0: ['2', 'A'], 1: ['5', 'BC']}

If you further want to convert the strings of numeric values to numbers, you can further use pd.to_numeric(), as follows:

df['col_1'].str.strip('[]').str.split(',\s*').apply(lambda x: [pd.to_numeric(y, errors='ignore') for y in x])

Result:

print(df['col_1'].str.strip('[]').str.split(',\s*').apply(lambda x: [pd.to_numeric(y, errors='ignore') for y in x]).to_dict())

{0: [2, 'A'], 1: [5, 'BC']}           # 2, 5 are numbers instead of strings
  • Related