Home > other >  Looking in rows of a excel table with Python
Looking in rows of a excel table with Python

Time:09-26

i have in Python a programm that loops in an excel sheet to get all the values from a column1 (using pandas). It loops so i can put the value as an xml file (with lxml) :

raw_data = pd.read_excel('myfile.xls', sheet_name='name')
for row in raw_data.iterrows():
    column_values = et.SubElement(root, 'values')

    value = et.SubElement(column_values, 'value')
    value.text = str(row[1]['Column1']
 

The problem i am having is that it loops until the last row with values. So if i have Column1 with 6 rows of data but on the same sheet i have an other column with 10 rows, it will loop 4 rows more. How can i prevent that and only make sure it loops for all the values in my column ?

thank you

EDIT my raw_data sheet looks like this :

name   id   sport    id
jane   01   football 04
mark   02   rugby    05
alonzo 03   cricket  06
            running  07
            highjump 08
            baseball 09

what i have after running my code is :

        <value id="01">
            <name>jane</name>
        </value>
        <value id="02">
            <name>mark</name>
        </value>
        <value id="03">
            <name>alonzo</name>
        </value>
        <value id="nan">
            <name>nan</name>
        </value>
        <value id="nan">
            <name>nan</name>
        </value>
        <value id="nan">
            <name>nan</name>
        </value>

and what i want is to get rid of the column without any values :

        <value id="01">
            <name>jane</name>
        </value>
        <value id="02">
            <name>mark</name>
        </value>
        <value id="03">
            <name>alonzo</name>
        </value>

fyi : my 'name' and 'id' column are in a table called 'nametable' thank you

CodePudding user response:

Few ways to go about it.

  1. just have a little logic in your code that skips nulls/nans.
  2. Just drop those nulls rows right off the bat

Code:

from email.contentmanager import raw_data_manager
import pandas as pd
from lxml import etree as et

#global raw_data

raw_data = pd.read_excel('tryout2.xlsm', sheet_name="Feuil1")

values = et.Element('values')
for row in raw_data[['Name']].dropna().iterrows():
    column_values = et.SubElement(values, 'value')

    value = et.SubElement(column_values, 'value')
    value.text = str(row[1]['Name'])

tree = et.ElementTree(values)
et.indent(tree, space="\t", level=0)
tree.write('new.xml', encoding="utf-8")

Output:

<values>
    <value>
        <value>Jane</value>
    </value>
    <value>
        <value>Mark</value>
    </value>
    <value>
        <value>Alonzo</value>
    </value>
</values>
  • Related