I have a pandas dataframe like below
no. col1 col2
1 abc 123
2 bcd 234
3 cde 345
All I would like to create a new column and mark first 4 rows with number 1, rows 5 to 7 with number 2, and the remaining rows with number 3. Expected output:
no. col1 col2 new_col
1 abc 123 1
2 bcd 234 1
3 cde 345 1
.........
I am sure it can be done fairly easily, but not able to do it, any help would be appreciated.
CodePudding user response:
You can use np.repeat
:
import numpy as np
df['new_col'] = np.repeat([1, 2, 3], [4, 3, len(df)-7])
Output:
>>> df
no. col1 col2 new_col
0 1 abc 123 1
1 1 abc 123 1
2 1 abc 123 1
3 1 abc 123 1
4 1 abc 123 2
5 1 abc 123 2
6 1 abc 123 2
7 1 abc 123 3
8 1 abc 123 3
9 1 abc 123 3
10 2 bcd 234 3
11 2 bcd 234 3
12 2 bcd 234 3
13 2 bcd 234 3
14 2 bcd 234 3
CodePudding user response:
You can use numpy.where together with df.index to assign the values to the new column.
df['new_col'] = np.where(df.index < 4, 1,
np.where((df.index >= 4) & (df.index < 7), 2, 3))
Here is some example code:
Code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1': ['abc', 'bcd', 'cde', 'efg', 'ghi','jkl','mno','pqr','stu','vwx'],
'col2': [123,234,345,456,567,678,789,890,901,120]})
df['new_col'] = np.where(df.index < 4, 1,
np.where((df.index >= 4) & (df.index < 7), 2, 3))
print(df)
Output:
col1 col2 new_col
0 abc 123 1
1 bcd 234 1
2 cde 345 1
3 efg 456 1
4 ghi 567 2
5 jkl 678 2
6 mno 789 2
7 pqr 890 3
8 stu 901 3
9 vwx 120 3
CodePudding user response:
Use cut
:
df = pd.DataFrame({'col1': range(10)})
df['new_col'] = pd.cut(np.arange(1,len(df) 1), [-1, 4, 7, len(df)], labels=[1, 2, 3])
print (df)
col1 new_col
0 0 1
1 1 1
2 2 1
3 3 1
4 4 2
5 5 2
6 6 2
7 7 3
8 8 3
9 9 3
CodePudding user response:
You can use numpy.where
for this in the following way:
import numpy as np
df['new_col'] = np.where(df['no.'] <= 4, 1,np.where(df['no.'].between(5,7, inclusive=True), 2, 3))
Another option is using pd.cut
for this:
df['new_col'] = pd.cut(df['no.'], [0, 4, 7, df['no.'].max()], labels=[1, 2, 3])
This will also create the same column, but with the difference that the values will be in category type.
CodePudding user response:
iter_index = iter(range(len(df.index)))
df["result"] = 0
df["result"] = df["result"].map(lambda x: next(iter_index)//3 1)
df