Given a sequence of numbers, first convert to interval and convert to dictionary as per sequence of number and then after that map it with a column in DataFrame as shown in DataFrame below.
Query-1 how to create interval from above sequence into dictionary
given_sequence = [-100, 2, 5, 8, 10, 15]# Given sequence
idict = {-100: '<2',
2:'2-5',
5:'5-8',
8:'8-10',
10:'10-15',
15:'>15'}
Query-2 how to map to a column as shown in below screen shot with dictionary and map function
import pandas as pd
import random
# create an Empty DataFrame object
df = pd.DataFrame()
df['col1'] = [random.randint(1, 10) for i in range(0, 10)]
df['mapped'] = df['col1'].map(idict)
** Expected Output**
CodePudding user response:
Your desired outcome can be obtained using pd.cut
. It creates a categorical variable from a continuous variable.
df['mapped'] = pd.cut(df['col1'], right=False, bins=given_sequence [float('inf')],
labels=['<2', '2-5', '5-8','8-10','10-15','>15'])
Output:
col1 mapped
0 6 5-8
1 1 <2
2 10 10-15
3 7 5-8
4 7 5-8
5 8 8-10
6 4 2-5
7 7 5-8
8 10 10-15
9 6 5-8
CodePudding user response:
If values in dictionary are sorted like in sample data add bins
and labels
parameters from dictionary keys and values in cut
:
#if necessary sorting dict by keys
#idict = {k: v for k, v in sorted(idict.items(), key=lambda item: item[0])}
df['mapped'] = pd.cut(df['col1'],
right=False,
bins=[*list(idict.keys()), np.inf],
labels=list(idict.values()))