Whenever my text files have empty new line at the end of the text files, the lists that I output in pandas dataframe will have commas at the back of the lists, how can make sure that even though there are empty new lines in the text files, the lists output in pandas dataframe wouldn't have the commas at the back of the lists?
from pathlib import Path
import pandas as pd
files = Path('data/').glob('*')
df = list()
for file in files:
df.append(file.read_text().replace('\n', ',')) # the file is opened and closed
df = pd.DataFrame(df, columns = ['fruits'])
df = df['fruits'].str.split(',').to_frame()
df
TEXT FILE 1
apple
banana
orange
<- empty new line here
<- empty new line here
TEXT FILE 2
kiwi
mango
grapes
berry
coconut
Current Output
fruits
0 [kiwi, mango, grapes, berry, coconut]
1 [apple, banana, orange, ,]
Expected Output
fruits
0 [kiwi, mango, grapes, berry, coconut]
1 [apple, banana, orange]
Any efficient way for me to solve the above issue to get the expected output without going into the individual text files and removing the trailing empty new lines manually? Thank you.
CodePudding user response:
Here are two options how you could do that:
(1) file.read().splitlines() which makes a list with one element for each row (every newline at the end will be an empty string, which you remove by filtering.
import pandas as pd
from pathlib import Path
files = Path('data/').glob('*.txt')
all_files=[]
for file in files:
with open(file, 'r') as f:
data = list(filter(None, f.read().splitlines()))
all_files.append(data)
df = pd.DataFrame({
'fruits' : all_files
})
print(df)
fruits
0 [apple, banana, orange]
1 [kiwi, mango, grapes, berry, coconuts]
or (2) you make usage of pd.read_csv
import pandas as pd
from pathlib import Path
files = Path('data/').glob('*.txt')
all_files=[]
for file in files:
data = pd.read_csv(file,header=None)[0].tolist()
all_files.append(data)
df = pd.DataFrame({
'fruits' : all_files
})
#same output as (1)