mark specific rows in pandas dataframe-CodePudding

I have a pandas dataframe like below

no.             col1             col2
 1              abc              123
 2              bcd              234
 3              cde              345

All I would like to create a new column and mark first 4 rows with number 1, rows 5 to 7 with number 2, and the remaining rows with number 3. Expected output:

no.             col1             col2           new_col
 1              abc              123               1
 2              bcd              234               1
 3              cde              345               1
 .........

I am sure it can be done fairly easily, but not able to do it, any help would be appreciated.

CodePudding user response：

You can use np.repeat:

import numpy as np

df['new_col'] = np.repeat([1, 2, 3], [4, 3, len(df)-7])

Output:

>>> df
    no. col1  col2  new_col
0     1  abc   123        1
1     1  abc   123        1
2     1  abc   123        1
3     1  abc   123        1
4     1  abc   123        2
5     1  abc   123        2
6     1  abc   123        2
7     1  abc   123        3
8     1  abc   123        3
9     1  abc   123        3
10    2  bcd   234        3
11    2  bcd   234        3
12    2  bcd   234        3
13    2  bcd   234        3
14    2  bcd   234        3

CodePudding user response：

You can use numpy.where together with df.index to assign the values to the new column.

df['new_col'] = np.where(df.index < 4, 1,
                        np.where((df.index >= 4) & (df.index < 7), 2, 3))

Here is some example code:

Code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ['abc', 'bcd', 'cde', 'efg', 'ghi','jkl','mno','pqr','stu','vwx'],
                   'col2': [123,234,345,456,567,678,789,890,901,120]})

df['new_col'] = np.where(df.index < 4, 1,
                        np.where((df.index >= 4) & (df.index < 7), 2, 3))

print(df)

Output:

  col1  col2  new_col
0  abc   123        1
1  bcd   234        1
2  cde   345        1
3  efg   456        1
4  ghi   567        2
5  jkl   678        2
6  mno   789        2
7  pqr   890        3
8  stu   901        3
9  vwx   120        3

CodePudding user response：

Use cut:

df = pd.DataFrame({'col1': range(10)})
  
df['new_col'] = pd.cut(np.arange(1,len(df) 1), [-1, 4, 7, len(df)], labels=[1, 2, 3])
print (df)
   col1 new_col
0     0       1
1     1       1
2     2       1
3     3       1
4     4       2
5     5       2
6     6       2
7     7       3
8     8       3
9     9       3

CodePudding user response：

You can use numpy.where for this in the following way:

import numpy as np
df['new_col'] = np.where(df['no.'] <= 4, 1,np.where(df['no.'].between(5,7, inclusive=True), 2, 3))

Another option is using pd.cut for this:

df['new_col'] = pd.cut(df['no.'], [0, 4, 7, df['no.'].max()], labels=[1, 2, 3])

This will also create the same column, but with the difference that the values will be in category type.

CodePudding user response：

iter_index = iter(range(len(df.index)))
df["result"] = 0
df["result"] = df["result"].map(lambda x: next(iter_index)//3 1)
df