I am trying to loop through all items in a column.
If it meets the condition in the column then check to see, If another condition is met in a different column.
The data types in the two columns are different, one is an integer & the other a string that I am checking for.
I am not sure where I have gone wrong & all information I've found online so far is about creating new columns based on data from a column.
Here is a simplified version of the code I am using.
import pandas as pd
servercount = 0
othercount = 0
df = pd.DataFrame({
'operatingsystem': ['Windows 7', 'Windows Server 2012', 'Windows Server 2012 R',
'macOSX', 'Linux Server'],
'login': [9, 12, 5, 11, 19],
})
for i in df['login']:
if i > 10:
if i in df['operatingsystem'].str.lower().str.contains("server"):
servercount = 1
else:
othercount = 1
print("Servers:", servercount, "\nOthers:", othercount)
Currently this returns 0 on the servercount & 3 on the othercount meaning that it is not detecting the word server in the operatingsystem column.
CodePudding user response:
If you only want the final number (as in total servercount and othercount), then consider simply passing the logics to your dataframe. You are iterating over i
only with is df['Login']
, nowhere are you using operatingsystem
values.
servercount = ((df['login'] > 10) & \
(df['operatingsystem'].str.contains("server",
case=False,
regex=True))).sum()
Then, by difference you can calculate othercount
:
othercount = len(df) - servercount
CodePudding user response:
Create mask wich chain both masks toghether by &
bot bitwise AND
and for count use Series.value_counts
, last select True
and False
values:
m = (df['login'] > 10) & df['operatingsystem'].str.contains("server", case=False)
print (m.value_counts())
False 3
True 2
dtype: int64
out = m.value_counts()
print("Servers:", out[True], "\nOthers:", out[False])
Servers: 2
Others: 3
EDIT: Because need 3 counts are created 3 mask separately, ~
is for invert mask (same like mask == False
):
m1 = df['login'] > 10
m2 = df['operatingsystem'].str.contains("server", case=False)
print("Servers greater 10:", (m1 & m2).sum(),
"\nServers less 10:", (m1 & ~m2).sum(),
"\nOthers:", (~m1).sum())
Servers greater 10: 2
Servers less 10: 1
Others: 2