Home > database >  Create a new column to indicate dataframe split at least n-rows groups in Python [duplicate]
Create a new column to indicate dataframe split at least n-rows groups in Python [duplicate]

Time:09-24

Given a dataframe df as follows:

df = pd.DataFrame({'Date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'Sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'Value': [11, 8, 10, 15, 110, 60, 100, 40]})

Out:

         Date   Sym  Value
0  2015-05-08  aapl     11
1  2015-05-07  aapl      8
2  2015-05-06  aapl     10
3  2015-05-05  aapl     15
4  2015-05-08  aaww    110
5  2015-05-07  aaww     60
6  2015-05-06  aaww    100
7  2015-05-05  aaww     40

I hope to create a new column Group to indicate groups with a range of integers starting from 1, each group should have 3 rows, except for the last group which may have less than 3 rows.

The final result will like this:

         Date   Sym  Value Group
0  2015-05-08  aapl     11  1
1  2015-05-07  aapl      8  1 
2  2015-05-06  aapl     10  1
3  2015-05-05  aapl     15  2
4  2015-05-08  aaww    110  2
5  2015-05-07  aaww     60  2
6  2015-05-06  aaww    100  3
7  2015-05-05  aaww     40  3

How could I achieve that with Pandas or Numpy? Thanks.

My trial code:

n = 3
for g, df in df.groupby(np.arange(len(df)) // n):
    print(df.shape)

CodePudding user response:

You are close, assign output from groupby to new column and add 1:

n = 3
df['Group'] = np.arange(len(df)) // n   1
print (df)
         Date   Sym  Value  Group
0  2015-05-08  aapl     11      1
1  2015-05-07  aapl      8      1
2  2015-05-06  aapl     10      1
3  2015-05-05  aapl     15      2
4  2015-05-08  aaww    110      2
5  2015-05-07  aaww     60      2
6  2015-05-06  aaww    100      3
7  2015-05-05  aaww     40      3
  • Related