Home > OS >  How to count countires per unit of time in pandas
How to count countires per unit of time in pandas

Time:12-06

I have a following problem. I have these data:

data = pd.DataFrame(
    {
        "time": [
            "2021-11-30 00:01:08",
            "2021-11-30 00:01:10",
            "2021-11-30 00:03:08",
            "2021-11-30 00:10:08",
        ],
        "country": ["Czechia","Czechia","USA","Czechia"],
    }
)

I need to count how many times each country occurs in the dataset in each five minute. That is from 2021-11-30 00:00:00 to 2021-11-30 00:04:59 ; 2021-11-30 00:05:00 to 2021-11-30 00:09:59 etc. Countries should be transformed into new columns. Desired output is:

output = pd.DataFrame(
    {
        "Czechia": [2,0,1],
        "USA": [1,0,0],
    }
)

How can I do this, please?

CodePudding user response:

Use Grouper with GroupBy.size and reshape by Series.unstack and DataFrame.asfreq:

data["time"] = pd.to_datetime(data["time"])

    
df = (data.groupby([pd.Grouper(freq='5Min', key='time'), 'country'])
          .size()
          .unstack(fill_value=0)
          .asfreq('5Min', fill_value=0))
print (df)
country              Czechia  USA
time                             
2021-11-30 00:00:00        2    1
2021-11-30 00:05:00        0    0
2021-11-30 00:10:00        1    0

CodePudding user response:

One way using pandas.to_datetime with floor then groupby:

data["time"] = pd.to_datetime(data["time"]).dt.floor("5min")
new_df = data.groupby("time")["country"].value_counts().unstack().asfreq("5min").fillna(0)
print(new_df)

Output:

country              Czechia  USA
time                             
2021-11-30 00:00:00      2.0  1.0
2021-11-30 00:05:00      0.0  0.0
2021-11-30 00:10:00      1.0  0.0
  • Related