I'm trying to append an extra column heading to a CSV file, and to also add a column value for each row in the CSV file.
I am able to partially get this right, but my code adds an additional header row as well. The TIMESTAMP column does not exist in the CSV file - this is the new header and column values I am trying to insert.
import os
import pandas as pd
from datetime import datetime
# Get the list of all files and directories
path = "//wherever"
dir_list = os.listdir(path)
fields = ['Field1', 'Field2', 'Field3','TIMESTAMP']
for i in dir_list:
file = path "//" i
time_raw = i.split("_")
time_compressed = time_raw[1]
d = datetime.strptime(time_compressed, "%Y%m%dT%H%MZ")
df = pd.read_csv(file, encoding="cp1252", names=fields, low_memory=True)
df['TIMESTAMP'] = d
df.to_csv(file)
CodePudding user response:
The reason why you're getting this additional header is because you're using names
inside your pd.read_csv
call. If you want to read only the columns from the fields
list, change the parameter names
to usecols
.
For example:
import os
import pandas as pd
from datetime import datetime
# Get the list of all files and directories
path = "//wherever"
dir_list = os.listdir(path)
fields = ['Field1', 'Field2', 'Field3']
for filename in dir_list:
filepath = f"{path}{os.path.sep}{filename}"
time_compressed = filename. split("_")[1]
d = datetime.strptime(time_compressed, "%Y%m%dT%H%MZ")
df = pd.read_csv(filepath, encoding="cp1252", usecols=fields, low_memory=True)
df["TIMESTAMP"] = d
df.to_csv(filepath, index=False)