Home > Back-end >  Compare values from a subcategory and aggregate the max of each subcategory by category
Compare values from a subcategory and aggregate the max of each subcategory by category

Time:08-10

I have a pandas dataframe with three columns. The first one indicates the location, the second shows the status of the hosts inside the location and the third indicates how many hosts are in location per status. I need to determine if a location is "Enabled" or "Disabled" considering the count of hosts by status. For example, if a location has only hosts of status "Enabled", this location is "Enabled" (the same for "Disabled"). If a location has more hosts "Enabled" than "Disabled", this location is "Enabled" (the same for "Disabled"). If the number of hosts "Enabled" is equal to "Disabled", the locations is "Disabled".

So, i need to transform this:

Location Status Host
A Enabled 3
B Enabled 6
C Disabled 2
C Enabled 2
D Enabled 1
E Disabled 1
E Enabled 2
F Disabled 1
F Enabled 11
G Enabled 40
H Enabled 6
I Disabled 4
J Disabled 4
J Enabled 1
L Disabled 5

In that:

Location Status
A Enabled
B Enabled
C Disabled
D Enabled
E Enabled
F Enabled
G Enabled
H Enabled
I Disabled
J Disabled
L Disabled

Any help would be appreciate.

CodePudding user response:

here is one way to do it

# get count of hosts by location and status, then unstack and fill NaN with 0
df2=df.groupby(['Location', 'Status'])['Host'].sum().unstack(1,0).reset_index() 

# use np.where to identify the location status
df2['Location_Status'] = np.where((df2['Enabled'] > df2['Disabled']), 'Enabled', 'Disabled')
df2[['Location','Location_Status']]


Status  Location    Location_Status
0             A     Enabled
1             B     Enabled
2             C     Disabled
3             D     Enabled
4             E     Enabled
5             F     Enabled
6             G     Enabled
7             H     Enabled
8             I     Disabled
9             J     Disabled
10            L     Disabled
  • Related