I have a pandas dataframe that looks like this :
Date Min Max C
01.01.2003 01.01.2003 Nan NaT
02.01.2003 Nan Nan NaT
03.01.2003 Nan Nan NaT
04.01.2003 Nan 04.01.2003 NaT
06.01.2003 06.01.2003 Nan NaT
07.01.2003 Nan Nan NaT
08.01.2003 Nan 08.01.1993 NaT
09.01.2003 Nan Nan 09.01.2003
14.01.2003 14.01.2003 Nan NaT
15.01.2003 Nan Nan NaT
16.01.2003 Nan 16.01.2003 NaT
29.01.2003 Nan Nan 29.01.2003
And I want to get a XML data that looks like this
-<Or>
-<Date Source="test" test2="test3">
<Min>01.01.2003</Min>
<Max>01.04.2003</Max>
</Date>
-<Date Source="test" test2="test3">
<Min>06.01.2003</Min>
<Max>08.01.2003</Max>
</Date>
-<Date Source="test" test2="test3">
<Min>14.01.2003</Min>
<Max>16.01.2003</Max>
</Date>
-<Date Source="="test" test2="test3">
09.01.2003 29.01.2023
<Dates></Dates>
</Date>
This is the code :
data = gfg.Element("Or")
for idx, row in data.iterrows():
element1 = gfg.SubElement(data, "Test")
element2 = gfg.SubElement(data, "Test2")
s_elem1 = gfg.SubElement(element1, 'Min')
s_elem2 = gfg.SubElement(element1, 'Max')
s_elem1.text=row['Min']
s_elem2.text=row['Max']
b_xml=gfg.tostring(data)
Because it loops over row which then there will be max or min would be empty/blank space. what should I modified so I can get like the example above ? thank
CodePudding user response:
As commented, consider aggregating data before exporting to XML:
Data
from io import StringIO
import pandas as pd
txt = '''\
Date Min Max C
01.01.2003 01.01.2003 NaT NaT
02.01.2003 NaT NaT NaT
03.01.2003 NaT NaT NaT
04.01.2003 NaT 04.01.2003 NaT
06.01.2003 06.01.2003 NaT NaT
07.01.2003 NaT NaT NaT
08.01.2003 NaT 08.01.1993 NaT
09.01.2003 NaT NaT 09.01.2003
14.01.2003 14.01.2003 NaT NaT
15.01.2003 NaT NaT NaT
16.01.2003 NaT 16.01.2003 NaT
29.01.2003 NaT NaT 29.01.2003
'''
with StringIO(txt) as f:
dates_df = pd.read_csv(f, sep="\s ", parse_dates=["Min", "Max", "C"], dayfirst=True)
Cleaning / Grouping / Aggregation
agg_dates_df = (
dates_df.dropna(how="all", axis="rows", subset=["Min", "Max"])
.drop(["Date", "C"], axis="columns")
.assign(
Group = lambda df: pd.notnull(df["Min"]).astype('int').cumsum()
)
.groupby(["Group"])[["Min", "Max"]].max()
.apply(lambda col: col.dt.strftime("%d.%m.%Y"), axis="columns")
.assign(Source = "test", test2 = "test3")
)
print(agg_dates_df)
# Min Max Source test2
# Group
# 1 01.01.2003 04.01.2003 test test3
# 2 06.01.2003 08.01.1993 test test3
# 3 14.01.2003 16.01.2003 test test3
XML
output = dates_df.to_xml(
index = False,
root_name = "Or",
row_name = "Date",
attr_cols = ["Source", "test2"],
elem_cols = ["Min", "Max"]
)
print(output)
# <?xml version='1.0' encoding='utf-8'?>
# <Or>
# <Date Source="test" test2="test3">
# <Min>01.01.2003</Min>
# <Max>04.01.2003</Max>
# </Date>
# <Date Source="test" test2="test3">
# <Min>06.01.2003</Min>
# <Max>08.01.1993</Max>
# </Date>
# <Date Source="test" test2="test3">
# <Min>14.01.2003</Min>
# <Max>16.01.2003</Max>
# </Date>
# </Or>