Delete multiple data from several CSV files in one click-CodePudding

I have more than 500 csv files with identical format - 6 columns (count,height,link,title,titles,width) and with multiple lines. I would like to keep only the "link" column (with all urls) from these >500 csv files. At the end, I would like to convert the final file in .txt with all these data links.

original sample csv file -

count,height,link,title,titles,width
1,142,https://url.jpg,,,338
..
...
....

to .txt file -

https://url.jpg
https://url.jpg
https://url.jpg
https://url.jpg
https://url.jpg

Does anyone here have a solution to do that?

Many thanks.

CodePudding user response：

Here are my errors @sudoerAli

Traceback (most recent call last):
  File "/Users/xxx/PycharmProjects/csv/main.py", line 7, in <module>
    df = pd.read_csv(filename, usecols=['link'])
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1217, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/Users/xxx/PycharmProjects/csv/venv/lib/python3.9/site-packages/pandas/io/common.py", line 789, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'xxx.csv'

Process finished with exit code 1

CodePudding user response：

You can use pandas to read your CSV files as dataframes, get the link column from each file then write the result in a text file. Try this :

from os.path import abspath, join
from os import listdir
import pandas as pd

result_df = pd.DataFrame(columns=['link'])
abs_path = abspath(path) # path of your folder

for filename in listdir(abs_path): 
  df = pd.read_csv(join(abs_path, filename), usecols=['link'])
  result_df = pd.concat([result_df, df], ignore_index=True)

result_df.to_csv('result_df.txt', header=None, index=None, sep=' ', mode='w')

Note : columns names (at least link column) must be present explicitly in all files.