I have multiple files naming like starting with Quarter'Year prefix like "Q121", "Q222" and all of these files contains data like below, Identifier stays the same but price and power columns are chainging. What would be the best way to aggregate files and be able to plot column like price across data range e.g from Q1'20 to Q2'22.
I was thinking about merging different files based on Identifier and then change column names to reflect Q120 etc...
data = [['A1', 10,50], ['A2', 15,60], ['A3', 14,55]]
df = pd.DataFrame(data, columns=['Identifier', 'Price','Power'])
CodePudding user response:
from glob import glob
import os
import pandas as pd
files = glob.glob("target_folder/*.csv")
dfs = []
for file_path in files:
file_name = os.path.basename(file_path)
temp = pd.read_csv(file_path, names=["Price","Power"], header=None)
temp["Identifier"] = file_name.replace(".csv", "")
dfs.append(temp)
df = pd.concat(dfs)
# change Q122 -> 22Q1, so the plot can order full list nicely
df["Identifier"] = df["Identifier"].apply(lambda x: "".join([x[2],x[3],x[0],x[1]]))