Home > Enterprise >  Dataframe count different values in column b, groupby column a
Dataframe count different values in column b, groupby column a

Time:10-01

Currently I'm trying to count different values a dataframe.. Example: The *.csv I import looks like:

market availability
dach available
dach available
nl offline
fr available
nl offline
fr in_call
dach available
fr in_call

So I need to count the amount of availability, per market.. Currently I'm having :

def dach():
    dfa = pd.read_csv("./_data/users/availabilities.csv", encoding='ISO-8859-1')
    
    df_dach = dfa.groupby(dfa.market).get_group("DACH")

    dach_available = (len(df_dach[df_dach['availability'] == "available"]))
    dach_in_call = (len(df_dach[df_dach['availability'] == "in_call"]))
    dach_offline = (len(df_dach[df_dach['availability'] == "offline"]))
    dach_do_not_disturb = (len(df_dach[df_dach['availability'] == "do_not_disturb"]))
    dach_after_call_work = (len(df_dach[df_dach['availability'] == "after_call_work"]))

But this way, for 10 different markets, is hideous to see in my files.. Is there any way how I could have 1 function, that could get an input parameter and get the info all different markets?

So that I get the individual counts of the 5 statusses, per market..

CodePudding user response:

I think you should just be able to do:

dfa = pd.read_csv("./_data/users/availabilities.csv", encoding='ISO-8859-1')
dfa.groupby('market').value_counts()

CodePudding user response:

What about using a Counter?

from collections import Counter

for market, group in df.groupby('market'):
    print(market)
    print(Counter(group.availability))
    print()

dach
Counter({'available': 3})

fr
Counter({'in_call': 2, 'available': 1})

nl
Counter({'offline': 2})

Or value_counts()

for market, group in df.groupby('market'):
    print(market)
    print(group.availability.value_counts())
    print()

dach
available    3
Name: availability, dtype: int64

fr
in_call      2
available    1
Name: availability, dtype: int64

nl
offline    2
Name: availability, dtype: int64

For a function, I would do something like:

from collections import Counter

    
def market_status(market, df):   
    d = Counter(
        {
            "available":0,
            "in_call":0,
            "offline":0,
            "do_not_disturb":0,
            "after_call_work":0
        }
    )
    d.update(Counter(df.loc[df.market == market].availability))
    return d


print(market_status('fr', df))

Counter({'in_call': 2, 'available': 1, 'offline': 0, 'after_call_work': 0})

CodePudding user response:

This will give you the grouping I think you are looking for:

dfa = pd.read_csv("groupbycolumn.csv", encoding='ISO-8859-1')     
dfa.groupby(["Market", "Availability"])["Availability"].count()

Output:

Market  Availability
A       Available       1
        dont dist       2
        in_call         1
B       Available       1
        dont dist       1
        in_call         3
C       Available       1
        dont dist       2
H       Available       1
        in_call         3
J       in_call         1
Name: Availability, dtype: int64

This is a sample csv file to I created to give an example of how the output will look.

  • Related