I have this code. I need to group by CustomerName and then sum the filegroups.
def consolidated_df():
df = breakdown_df()
df.pivot_table(index='CustomerName', columns='FileGroup', aggfunc="sum")
return df
breakdown_df()
looks like
ID CustomerName FileGroup Size Size(Bytes)
1 CustomerA Database 99.8 M 104667648
1 CustomerA Database 99.8 M 104667648
1 CustomerA Backup 99.8 M 104667648
1 CustomerA Backup 99.8 M 104667648
1 CustomerA Site 99.8 M 104667648
1 CustomerA Site 99.8 M 104667648
2 CustomerB Database 99.8 M 104667648
2 CustomerB Database 99.8 M 104667648
2 CustomerB Backup 99.8 M 104667648
2 CustomerB Backup 99.8 M 104667648
2 CustomerB Site 99.8 M 104667648
2 CustomerB Site 99.8 M 104667648
I am trying to roll it up into
ID CustomerName DatabaseSize DatabaseSizeBytes BackupSize BackupSizeBytes SiteSize SiteSizeByte TotalSize
1 CustomerA [Total Size] [Total Size Bytes] [TotalSize] [Total Size Bites] [Total Site Size] [Total Site Bites] [Total Bytes for everything]
2 CustomerB [Total Size] [Total Size Bytes] [TotalSize] [Total Size Bites] [Total Site Size] [Total Site Bites] [Total Bytes for everything]
I'm not so worried about actually summing Size
because I can convert the bites. I just can't seem to get my pivot to work and unsure where I am going wrong.
CodePudding user response:
If you don't explicitly set values
, it'll try to use all remaining columns...
out = df.pivot_table(index='CustomerName', columns='FileGroup', values='Size(Bytes)', aggfunc='sum')
print(out)
Output:
FileGroup Backup Database Site
CustomerName
CustomerA 209335296 209335296 209335296
CustomerB 209335296 209335296 209335296
You can also have margins if desired:
df.pivot_table(index='CustomerName',
columns='FileGroup',
values='Size(Bytes)',
aggfunc='sum',
margins=True,
margins_name='TotalSize').drop('TotalSize')
# Output:
FileGroup Backup Database Site TotalSize
CustomerName
CustomerA 209335296 209335296 209335296 628005888
CustomerB 209335296 209335296 209335296 628005888
CodePudding user response:
With help of another StackOverflow answer:
# https://stackoverflow.com/a/23773174/10035985
def sizeof_fmt(num, use_kibibyte=True):
base, suffix = [(1000.0, "B"), (1024.0, "iB")][use_kibibyte]
for x in ["B", *map(lambda x: x suffix, list("kMGTP"))]:
if -base < num < base:
return "%3.1f %s" % (num, x)
num /= base
return "%3.1f %s" % (num, x)
x = df.pivot_table(
index=["ID", "CustomerName"],
columns=["FileGroup"],
values="Size(Bytes)",
aggfunc="sum",
)
columns_to_convert = list(x.columns)
x = pd.concat([x, x.add_suffix("SizeBytes")], axis=1)
x["TotalSize"] = x[columns_to_convert].sum(axis=1).apply(sizeof_fmt)
for c in columns_to_convert:
x[c] = x[c].apply(sizeof_fmt)
x.columns.name, x.index.name = None, None
print(x.reset_index())
Prints:
ID CustomerName Backup Database Site BackupSizeBytes DatabaseSizeBytes SiteSizeBytes TotalSize
0 1 CustomerA 199.6 MiB 199.6 MiB 199.6 MiB 209335296 209335296 209335296 598.9 MiB
1 2 CustomerB 199.6 MiB 199.6 MiB 199.6 MiB 209335296 209335296 209335296 598.9 MiB