I've created a program so that I can split the column into 2 columns by specifying the location of the file but I would like to do this for an entire folder so that the process can be a bit faster.
Here is the code for my program:
import pandas as pd
import os
// reading csv file from url
filepath = "C:/Users/username/folder1/folder2/folder3/filename.csv"
file_encoding = 'cp1252'
data = pd.read_csv(filepath , header=None , encoding=file_encoding)
data.columns =['Last Name', 'First Name']
// new data frame with split value columns
new = data["Name"].str.split(",", n = 1, expand = True)
// making separate first name column from new data frame
data["Last Name"]= new[0]
// making separate last name column from new data frame
data["First Name"]= new[1]
// new data frame with split value columns
new = data["SName"].str.split(",", n = 1, expand = True)
// making separate first name column from new data frame
data["S Last Name"]= new[0]
// making separate last name column from new data frame
data["S First Name"]= new[1]
// Saving File name as its path
filename = os.path.basename(filepath) ".xlsx"
data.to_excel(filename, index=False)
data
I created an automated text to column program so that I don't have to manually do for hundreds of data entries. But I have done it for a specified file. I would like to take this code a bit further by allowing it to take a folder of .csv and do the conversions.
Thanks for helping
CodePudding user response:
You can place your code in a function:
import pandas as pd
import os
def splitter(folder_path):
for file in os.listdir(folder_path):
if file.endswith(".csv"):
# reading csv file from url
filepath = os.path.join(folder_path, file)
file_encoding = 'cp1252'
data = pd.read_csv(filepath , header=None , encoding=file_encoding)
data.columns =['Last Name', 'First Name']
# new data frame with split value columns
new = data["Name"].str.split(",", n = 1, expand = True)
# making separate first name column from new data frame
data["Last Name"]= new[0]
# making separate last name column from new data frame
data["First Name"]= new[1]
# new data frame with split value columns
new = data["SName"].str.split(",", n = 1, expand = True)
# making separate first name column from new data frame
data["S Last Name"]= new[0]
# making separate last name column from new data frame
data["S First Name"]= new[1]
# Saving File name as its path
filename = os.path.basename(filepath) ".xlsx"
data.to_excel(filename, index=False)
print(file, " is done.")
# Example
fpath = "C:/Users/username/folder1/folder2/folder3"
splitter(fpath)
CodePudding user response:
Use glob library.
from glob import glob
files = glob('C:/Users/username/folder1/folder2/folder3/*')
for file in files:
# ... your function/code