I have a df with an ID column that needs to be deconstructed based on specific values.
ID |
---|
401 |
402 |
403 |
404 |
For deconstruction ID = 401, 405, 409 (i.e. increments of 4) corresponds to value "A" and ID = 402, 406, 410 corresponds to value "B"
I want to create a column "VALUE" that store values A, B based on the deconstruction of ID column with the 4-step increments.
ID | VALUE |
---|---|
401 | A |
402 | B |
403 | C |
404 | D |
405 | A |
406 | B |
I tried some simple while and if logic but they just do not work. Not worth providing my code here.
Does anyone know how we can do this in python?
Thank you
CodePudding user response:
One simple way is to define a lookup table and then use the function apply()
on the DataFrame. Here's a simple snippet:
import pandas as pd
d = {0: "A", 1: "B", 2: "C", 3: "D"}
df = pd.DataFrame({'ID': range(20)})
df['VALUE'] = df['ID'].apply(lambda x: d[x % 4])
CodePudding user response:
You may utilize modulo operation (%
) combine with numpy.select.
condition_list = [df['ID'] % 4 == 1, df['ID'] % 4 == 2, df['ID'] % 4 == 3]
choice_list = ['A', 'B', 'C']
default_value = 'D'
df['VALUE'] = np.select(condition_list, choice_list, default_value)
Alternatively, use modulo operator combine with pandas.Series.map.
df['VALUE'] = (df['ID'] % 4).map({0: 'D', 1: 'A', 2: 'B', 3: 'C'})
CodePudding user response:
Simple solution if the ID column is continuous and starts from 401:
df['VALUE'] = np.tile(['A', 'B', 'C', 'D'], len(df) // 4 1)[:len(df)]
Alternatively you can create a repeating sequence by modulo (%) division then add 65 to offset the sequence to represent the ascii uppercase characters
df['VALUE'] = df['ID'].sub(1).mod(4).add(65).map(chr)
# OR df['VALUE'] = (65 (df['ID'] - 1) % 4).astype('uint32').view('U1')
Result
ID VALUE
0 401 A
1 402 B
2 403 C
3 404 D
4 405 A
5 406 B