Home > Enterprise >  Creating Quarters column with months columns
Creating Quarters column with months columns

Time:08-26

So using python in Jupyter notebook, I have this data df with a column named as "Month"(more than 100,000 rows) having individual numbers upto 12. I want to create another column in that same data set named as "Quarters" so that it can display Quarters for those respective months.

I extracted month from "review_time" Column using ".dt.strftime('%m')"

I am sorry if the provided information was not enough. New to stack flow.

So I extracted month from column name :"date". I created a variable a and then added that variable a to the main table.

a = df['review_time'].dt.strftime('%m')

df.insert(2, "month",a, True)

this is the output for month.info() column

<class 'pandas.core.series.Series'>
Int64Index: 965093 entries, 1 to 989508
Series name: month
Non-Null Count   Dtype 
--------------   ----- 
965093 non-null  object
dtypes: object(1)
memory usage: 14.7  MB

CodePudding user response:

You could use pandas.cut

Example with a generic dataframe:

import numpy as np
import pandas as pd

np.random.seed(0)

df = pd.DataFrame({'Month': [1,2,3,4,5,6,7,8,9,10,11,12]})
df['Quarter'] = pd.cut(df['Month'], [0,3,6,9,12], labels = [1,2,3,4])
print(df)

This prints:

    Month Quarter
0       1       1
1       2       1
2       3       1
3       4       2
4       5       2
5       6       2
6       7       3
7       8       3
8       9       3
9      10       4
10     11       4
11     12       4

CodePudding user response:

An alternative is to calculate the quarter from the month number. qtr = ( month -1 ) // 3 1

import numpy as np
import pandas as pd 
from datetime import datetime

# lo and hi used to generate random dates in 2022
lo = datetime( 2022,  1,  1 ).toordinal() 
hi = datetime( 2022, 12, 31 ).toordinal()

np.random.seed( 1234 )
dates = [ datetime.fromordinal( np.random.randint( lo, hi )) for _ in range( 20 )]

df = pd.DataFrame( { 'Date': dates } )

df['Qtr'] = ( df['Date'].dt.month - 1 ) // 3   1

print( df )

Result

         Date  Qtr
0  2022-10-31    4
1  2022-07-31    3
2  2022-10-22    4
3  2022-02-23    1
4  2022-07-24    3
5  2022-06-02    2
6  2022-05-24    2
7  2022-06-27    2
8  2022-10-07    4
9  2022-08-22    3
10 2022-06-04    2
11 2022-01-31    1
12 2022-06-21    2
13 2022-06-08    2
14 2022-08-25    3
15 2022-10-10    4
16 2022-05-01    2
17 2022-11-22    4
18 2022-12-03    4
19 2022-09-04    3
  • Related