Home > Blockchain >  Rename dataframe columns prior to looping through URL's list
Rename dataframe columns prior to looping through URL's list

Time:04-19

I set up a code to loop through a list of linked XML files (urls_list), flatten the files and append rows. I would like to rename the columns, so I set a list of column names in cols. It seems like the rows are correctly appended in df but I can't figure out how to rename the columns.

Here is the code so far :

import pandas as pd
import pandas_read_xml as pdx

urls_list = ['https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/027/058/058com.xml',
             'https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/084/007/007com.xml',
             'https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/032/062/062com.xml']

cols = ['type','annee','code_region','code_region_3','libelle_region','code_departement','code_min_departement','code_departement_3','libelle_departement','code_commune','libelle_commune','numero_tour',
        'nombre_inscrits','nombre_abstention','rapport_inscrits_abstention','nombre_votants','rapport_inscrits_votants','nombre_votes_blancs','rapport_inscrits_vote_blanc','rapport_votant_vote_blanc',
        'nombre_votes_nuls','rapport_inscrits_votes_nuls','rapport_votant_votes_nuls','nombre_exprimes','rapport_inscrits_exprimes','rapport_votant_exprimes','numero_panneau_candidat','nom','prenom','civilite',
        'nombre_de_voix','rapport_exprimes','rapport_inscrits']
df = []

for i in urls_list:
  data = pdx.read_xml(i)
  df.append(pdx.fully_flatten(data))

df_all = pd.DataFrame(df, columns=cols)

CodePudding user response:

There is a method in pandas for this: .rename

Code sample from the docu:

df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})

df.rename(columns={"A": "a", "B": "c"})
   a  c
0  1  4
1  2  5
2  3  6

CodePudding user response:

To change the column names after the rows have been appended, a dictionary wite the desired column names must be created.

So, implementing the answer above to the original code would give :

urls_list = ['https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/027/058/058com.xml',
             'https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/084/007/007com.xml',
             'https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/032/062/062com.xml']

dfs = []

for i in urls_list:
  data = pdx.read_xml(i)
  dataframe = pdx.fully_flatten(data)
  dfs.append(dataframe)

df = pd.concat(dfs, ignore_index=True)

df = df.rename(columns={'A':'a','B':'b','C':'c'})
  • Related