I have more than 500 csv files with identical format - 6 columns (count,height,link,title,titles,width) and with multiple lines. I would like to keep only the "link" column (with all urls) from these >500 csv files. At the end, I would like to convert the final file in .txt with all these data links.
original sample csv file -
count,height,link,title,titles,width
1,142,https://url.jpg,,,338
..
...
....
to .txt file -
https://url.jpg
https://url.jpg
https://url.jpg
https://url.jpg
https://url.jpg
Does anyone here have a solution to do that?
Many thanks.
CodePudding user response:
Here are my errors @sudoerAli
Traceback (most recent call last):
File "/Users/xxx/PycharmProjects/csv/main.py", line 7, in <module>
df = pd.read_csv(filename, usecols=['link'])
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'xxx.csv'
Process finished with exit code 1
CodePudding user response:
You can use pandas to read your CSV files as dataframes, get the link column from each file then write the result in a text file. Try this :
from os.path import abspath, join
from os import listdir
import pandas as pd
result_df = pd.DataFrame(columns=['link'])
abs_path = abspath(path) # path of your folder
for filename in listdir(abs_path):
df = pd.read_csv(join(abs_path, filename), usecols=['link'])
result_df = pd.concat([result_df, df], ignore_index=True)
result_df.to_csv('result_df.txt', header=None, index=None, sep=' ', mode='w')
Note : columns names (at least link column) must be present explicitly in all files.