Home > OS >  Read text files with multiple lists with spacings and commas exist between elements in the lists int
Read text files with multiple lists with spacings and commas exist between elements in the lists int

Time:06-29

I have a text file called tropical.txt that have multiple lists and each list is separated by a new line. Notice the comma is surrounded by spaces.

 space here and space here
         | |
['papaya' , 'mangosteen' , 'banana']
[]
['coconut' , 'mango']
['mangosteen' , 'papaya']

I tried the following code

import pandas as pd

df = pd.read_csv('tropical.txt', sep='\n', header=None, engine = 'python')
df

which gives me

ValueError: Specified \n as separator or delimiter. This forces the python engine which does not accept a line terminator. Hence it is not allowed to use the line terminator as separator.

If I were to just do

import pandas as pd

df = pd.read_csv('tropical.txt', header= None, engine = 'python')
df

The output isn't what I wanted

         0           1             2
0   ['papaya'   'mangosteen'    'banana']
1   []               None        None
2   ['coconut'      'mango']     None
3   ['mangosteen'   'papaya']    None


I am expecting


                        0   
0   [papaya,mangosteen,banana]
1   []  
2   [coconut,mango] 
3   [mangosteen,papaya]


Any suggestion?

CodePudding user response:

You can use read_csv, by specifying a separator which will not occur in the lines (e.g. \0) (so that each line will be read as a whole) and ast.literal_eval as a converter for the values:

import ast

pd.read_csv('tropical.txt', header=None, sep='\0', names=['fruits'], converters={ 'fruits' : ast.literal_eval })

Output:

                         fruits
0  [papaya, mangosteen, banana]
1                            []
2              [coconut, mango]
3          [mangosteen, papaya]
  • Related