I have a pandas dataframe with three columns. The first one indicates the location, the second shows the status of the hosts inside the location and the third indicates how many hosts are in location per status. I need to determine if a location is "Enabled" or "Disabled" considering the count of hosts by status. For example, if a location has only hosts of status "Enabled", this location is "Enabled" (the same for "Disabled"). If a location has more hosts "Enabled" than "Disabled", this location is "Enabled" (the same for "Disabled"). If the number of hosts "Enabled" is equal to "Disabled", the locations is "Disabled".
So, i need to transform this:
Location | Status | Host |
---|---|---|
A | Enabled | 3 |
B | Enabled | 6 |
C | Disabled | 2 |
C | Enabled | 2 |
D | Enabled | 1 |
E | Disabled | 1 |
E | Enabled | 2 |
F | Disabled | 1 |
F | Enabled | 11 |
G | Enabled | 40 |
H | Enabled | 6 |
I | Disabled | 4 |
J | Disabled | 4 |
J | Enabled | 1 |
L | Disabled | 5 |
In that:
Location | Status |
---|---|
A | Enabled |
B | Enabled |
C | Disabled |
D | Enabled |
E | Enabled |
F | Enabled |
G | Enabled |
H | Enabled |
I | Disabled |
J | Disabled |
L | Disabled |
Any help would be appreciate.
CodePudding user response:
here is one way to do it
# get count of hosts by location and status, then unstack and fill NaN with 0
df2=df.groupby(['Location', 'Status'])['Host'].sum().unstack(1,0).reset_index()
# use np.where to identify the location status
df2['Location_Status'] = np.where((df2['Enabled'] > df2['Disabled']), 'Enabled', 'Disabled')
df2[['Location','Location_Status']]
Status Location Location_Status
0 A Enabled
1 B Enabled
2 C Disabled
3 D Enabled
4 E Enabled
5 F Enabled
6 G Enabled
7 H Enabled
8 I Disabled
9 J Disabled
10 L Disabled