Dynamic creation of pandas DataFrames-CodePudding

The end goal is to read multiple .cvs files into multiple DataFrames with certain names. I want to be able to refer to my DataFrame by the name of the city for further analysis and manipulate them separately. So it is important to achieve that and not keep them in a dictionary. But what ends up happening is that the last item in the dict gets assigned to every variable, so I get differently names dfs created but they all have the same data.

lst0 = ['/User1/Research/comp_dataset/yutas_tg/Annapolis__MD_ALL.csv',
 '/User1/Research/comp_dataset/yutas_tg/Apalachicola__FL_ALL.csv',
        '/User1/Research/comp_dataset/yutas_tg/Atlantic_City__NJ_ALL.csv']
names_3 = ['annapolis','apalachicola','atlantic_city']

d = {}
for fname in lst0:
    d[fname] = pd.read_csv(fname)

for nm in names_3:
    for fname in lst0:
        globals()[nm] = d[fname]

What am I doing wrong? Thank you!

CodePudding user response：

Your variable naming makes no sense to me. Please name them something relevant to the values they hold.

As to your problem:

paths = [
    "/User1/Research/comp_dataset/yutas_tg/Annapolis__MD_ALL.csv",
    "/User1/Research/comp_dataset/yutas_tg/Apalachicola__FL_ALL.csv",
    "/User1/Research/comp_dataset/yutas_tg/Atlantic_City__NJ_ALL.csv",
]
cities = ["annapolis", "apalachicola", "atlantic_city"]

# Create one dataframe per CSV file
d = {
    city: pd.read_csv(path) for path, city in zip(paths, cities)
}

# Join the frames together, adding the new `city` column
df = (
    pd.concat(d.values(), keys=d.keys(), names=["city", None])
    .reset_index(level=0)
    .reset_index(drop=True)
)

CodePudding user response：

Ok. I figured it out. Combining what Code Different suggested below but skipping the concatenation part. I finally get the variables(that are dataframes) with the names of the cities created.

paths = [
    "/User1/Research/comp_dataset/yutas_tg/Annapolis__MD_ALL.csv",
    "/User1/Research/comp_dataset/yutas_tg/Apalachicola__FL_ALL.csv",
    "/User1/Research/comp_dataset/yutas_tg/Atlantic_City__NJ_ALL.csv",
]
cities = ["annapolis", "apalachicola", "atlantic_city"]

# Create one dataframe per CSV file
d = {
    city: pd.read_csv(path) for path, city in zip(paths, cities)
}

for k in d.keys():
    exec(f"{k} = d['{k}']")