Home > database >  Read individual text files with new lines at the end of the files but don't want comma to appea
Read individual text files with new lines at the end of the files but don't want comma to appea

Time:06-15

Whenever my text files have empty new line at the end of the text files, the lists that I output in pandas dataframe will have commas at the back of the lists, how can make sure that even though there are empty new lines in the text files, the lists output in pandas dataframe wouldn't have the commas at the back of the lists?

from pathlib import Path
import pandas as pd

files = Path('data/').glob('*')

df = list()

for file in files:
    df.append(file.read_text().replace('\n', ','))  # the file is opened and closed

df = pd.DataFrame(df, columns = ['fruits'])
df = df['fruits'].str.split(',').to_frame()
df

TEXT FILE 1

apple
banana
orange
      <- empty new line here
      <- empty new line here 

TEXT FILE 2

kiwi
mango
grapes
berry
coconut

Current Output

    fruits

0   [kiwi, mango, grapes, berry, coconut]
1   [apple, banana, orange, ,]

Expected Output

    fruits

0   [kiwi, mango, grapes, berry, coconut]
1   [apple, banana, orange]

Any efficient way for me to solve the above issue to get the expected output without going into the individual text files and removing the trailing empty new lines manually? Thank you.

CodePudding user response:

Here are two options how you could do that:

(1) file.read().splitlines() which makes a list with one element for each row (every newline at the end will be an empty string, which you remove by filtering.

import pandas as pd
from pathlib import Path
files = Path('data/').glob('*.txt')

all_files=[]
for file in files:
    with open(file, 'r') as f:
        data = list(filter(None, f.read().splitlines()))
    all_files.append(data)

df = pd.DataFrame({
    'fruits' : all_files
})
print(df)

                                   fruits
0                 [apple, banana, orange]
1  [kiwi, mango, grapes, berry, coconuts]

or (2) you make usage of pd.read_csv

import pandas as pd
from pathlib import Path
files = Path('data/').glob('*.txt')

all_files=[]
for file in files:
    data = pd.read_csv(file,header=None)[0].tolist()
    all_files.append(data)

df = pd.DataFrame({
    'fruits' : all_files
})

#same output as (1)
  • Related