I have this DF.
import pandas as pd
import scipy.optimize as sco
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
data = [['ATTICA',1,2,590,680],['ATTICA',1,2,800,1080],['AVON',14,2,950,1250],['AVON',15,3,500,870],['AVON',20,4,1350,1700]]
df = pd.DataFrame(data, columns=['cities','min_workers','max_workers','min_minutes','max_minutes'])
df
df['Non_HT_Outages'] = (df['min_workers'] < 15).groupby(df['cities']).transform('count')
df['HT_Outages'] = (df['min_workers'] >= 15).groupby(df['cities']).transform('count')
df
I am trying to count items in a column named 'min_workers' and if <15, put into column 'Non-HT' but if >=15, put into column 'HT'. My counts seem to be off.
CodePudding user response:
If you change count
to sum
in the transforms it will work.
This is because (df['min_workers'] < 15).groupby(df['cities'])
creates a boolean list for each city, with True
entries when the condition is met and False
entries when it is not.
count
gives the count of all the entries, both true and false. sum
just counts the true ones (treating the trues and falses as 1s and 0s, and summing them).