Home > Mobile >  How to transform a .csv string list representation to list?
How to transform a .csv string list representation to list?

Time:11-26

Suppose a .csv file which looks like this:

  • title: is the name of the column
  • and [senior innovation manager] is the first row.

Note: both strings (title and row) look exactly as written here.

title    
[senior innovation manager]

The idea is to convert this list string representation to an actual python list:

import ast
import pandas as pd
import numpy as np

# read the file
df = pd.read_csv(file_path, sep=',', na_values='NA', encoding='latin-1')

# convert first row to actual python list
df['title'][0]=ast.literal_eval(df['title'][0])

# inspect if ast.literal_eval() converted to actual list:
print(df['title'][0])
print(type(df['title'][0]))

However when tried the above code the next error arises:

Traceback (most recent call last):
  File "file_path", line 76, in <module>        
    df['title'][0]=ast.literal_eval(df['title'][0])
  File "C:\Users\id\Anaconda3\lib\ast.py", line 46, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "C:\Users\id\Anaconda3\lib\ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    [senior innovation manager]

What's the nature of this error?

Is it possible to convert this list string representation to an actual python list?

CodePudding user response:

I don't see any advantage to treating this as a CSV file or using pandas. You could simply read the second line of the file and strip the unwanted stuff out. You can do that by grabbing a slice from the second character to one before the end. In python list syntax, that's 1:-1.

with open(file_path) as fileobj:
    # skip title
    fileobj.readline()
    # get data
    title_list = [fileobj.readline().strip()[1:-1]]

CodePudding user response:

In order to use literal_eval your string must be written exactly as it would written in code. That is the string values contained in your list must be in quotes and separated by a comma. So your string should look something like this ['senior', 'innovation', 'manager']
If you're set on using this method you could try replacing the spaces in your string by ', ' and then adding the last two quotes after opening and before closing the brackets.

  • Related