Home > Software engineering >  Execute based on values of different data types in different columns of a data frame | Pandas
Execute based on values of different data types in different columns of a data frame | Pandas

Time:11-14

I am trying to loop through all items in a column.

If it meets the condition in the column then check to see, If another condition is met in a different column.

The data types in the two columns are different, one is an integer & the other a string that I am checking for.

I am not sure where I have gone wrong & all information I've found online so far is about creating new columns based on data from a column.

Here is a simplified version of the code I am using.

import pandas as pd

servercount = 0
othercount = 0

df = pd.DataFrame({
    'operatingsystem': ['Windows 7', 'Windows Server 2012', 'Windows Server 2012 R', 
                'macOSX', 'Linux Server'],
    'login': [9, 12, 5, 11, 19],
})


for i in df['login']:
    if i > 10:
        if i in df['operatingsystem'].str.lower().str.contains("server"):
            servercount  = 1
        else:
            othercount  = 1

print("Servers:", servercount, "\nOthers:", othercount)

Currently this returns 0 on the servercount & 3 on the othercount meaning that it is not detecting the word server in the operatingsystem column.

CodePudding user response:

If you only want the final number (as in total servercount and othercount), then consider simply passing the logics to your dataframe. You are iterating over i only with is df['Login'], nowhere are you using operatingsystem values.

servercount = ((df['login'] > 10) & \
           (df['operatingsystem'].str.contains("server",
                                               case=False,
                                               regex=True))).sum()

Then, by difference you can calculate othercount:

othercount = len(df) - servercount

CodePudding user response:

Create mask wich chain both masks toghether by & bot bitwise AND and for count use Series.value_counts, last select True and False values:

m = (df['login'] > 10) & df['operatingsystem'].str.contains("server", case=False)
print (m.value_counts())
False    3
True     2
dtype: int64


out = m.value_counts()

print("Servers:", out[True], "\nOthers:", out[False])
Servers: 2 
Others: 3

EDIT: Because need 3 counts are created 3 mask separately, ~ is for invert mask (same like mask == False):

m1 = df['login'] > 10 
m2 =  df['operatingsystem'].str.contains("server", case=False)

print("Servers greater 10:", (m1 & m2).sum(), 
      "\nServers less 10:", (m1 & ~m2).sum(), 
      "\nOthers:", (~m1).sum())
Servers greater 10: 2 
Servers less 10: 1 
Others: 2
  • Related