i have in Python a programm that loops in an excel sheet to get all the values from a column1 (using pandas). It loops so i can put the value as an xml file (with lxml) :
raw_data = pd.read_excel('myfile.xls', sheet_name='name')
for row in raw_data.iterrows():
column_values = et.SubElement(root, 'values')
value = et.SubElement(column_values, 'value')
value.text = str(row[1]['Column1']
The problem i am having is that it loops until the last row with values. So if i have Column1 with 6 rows of data but on the same sheet i have an other column with 10 rows, it will loop 4 rows more. How can i prevent that and only make sure it loops for all the values in my column ?
thank you
EDIT my raw_data sheet looks like this :
name id sport id
jane 01 football 04
mark 02 rugby 05
alonzo 03 cricket 06
running 07
highjump 08
baseball 09
what i have after running my code is :
<value id="01">
<name>jane</name>
</value>
<value id="02">
<name>mark</name>
</value>
<value id="03">
<name>alonzo</name>
</value>
<value id="nan">
<name>nan</name>
</value>
<value id="nan">
<name>nan</name>
</value>
<value id="nan">
<name>nan</name>
</value>
and what i want is to get rid of the column without any values :
<value id="01">
<name>jane</name>
</value>
<value id="02">
<name>mark</name>
</value>
<value id="03">
<name>alonzo</name>
</value>
fyi : my 'name' and 'id' column are in a table called 'nametable' thank you
CodePudding user response:
Few ways to go about it.
- just have a little logic in your code that skips nulls/nans.
- Just drop those nulls rows right off the bat
Code:
from email.contentmanager import raw_data_manager
import pandas as pd
from lxml import etree as et
#global raw_data
raw_data = pd.read_excel('tryout2.xlsm', sheet_name="Feuil1")
values = et.Element('values')
for row in raw_data[['Name']].dropna().iterrows():
column_values = et.SubElement(values, 'value')
value = et.SubElement(column_values, 'value')
value.text = str(row[1]['Name'])
tree = et.ElementTree(values)
et.indent(tree, space="\t", level=0)
tree.write('new.xml', encoding="utf-8")
Output:
<values>
<value>
<value>Jane</value>
</value>
<value>
<value>Mark</value>
</value>
<value>
<value>Alonzo</value>
</value>
</values>