Home > Net >  pandas dataframe groupby get count by multi column condition
pandas dataframe groupby get count by multi column condition

Time:04-28

I have a excel like this:

year month day time a b c
2020 1 1 00:00 12 23 12
2020 1 1 01:00 31 0 13
2020 1 1 ... ... ... 14
2020 1 1 23:00 ... ... 23
2020 ... ... ... ... ... 24
2020 12 31 23:00 32 0 35
2021 1 1 01:00 31 0 46
2021 1 1 ... ... ... 56
2021 1 1 23:00 ... ... 45
2021 ... ... ... ... ... 34
2021 12 31 23:00 32 0 45

I wanna get data: the sum of c for the row that a >= 30 and b==0 group by year notice that as long as there's time that fulfill a >= 30 and b==0,we can count this day

I wanna use pandas dataframe to implement this, can anyone help? I'm quite new to python

CodePudding user response:

First filter rows by condition in boolean indexing and then aggregate sum for column c:

df['datetime'] = pd.to_datetime(df[['year','month','day']])
df1 = df[(df.a>=30) & (df.b==0)].groupby('datetime', as_index=False)['c'].sum()

CodePudding user response:

First of all, one will need to read the data.

If by "excel" you mean a .xlsx file, considering that the file is named data.xlsx, read the file into a dataframe with

df1 = pd.read_excel('data.xlsx')

If by "excel" you mean a .csv file, assuming the file is named data.csv, read the file into a dataframe with

df1 = pd.read_csv('data.csv')

Once you have the Excel data into a dataframe, now it is time to run queries on top of that.

In order to get a dataframe with only the rows where a>= 30 and b== 0, do the following

df2 = df1[(df1['a'] >= 30) & (df1['b'] == 0)]

Now, considering that you want to order by year, month and day (as you mentioned), one can generate a new column in the dataframe

df3['date'] = pd.to_datetime(df2[['year', 'month', 'day']])

Then, using the new dataframe, that was filtered above and to which we added a new column, one can group by variable/column date and sum the values of the column c with

df4['sum_c'] = df3.groupby('date')['c'].sum()

The final dataframe would be df4, and to print it

print(df4)

Note: To make it easier for you to understand, anytime one is doing a major change, one is creating a new dataframe.

  • Related