I have a dataframe such as:
Groups Names Values
G1 SP1 1
G1 SP1 5
G1 SP1 -2
G1 SP1 30
G1 SP1 50
G1 SP1 50
G1 SP1 -1
G1 SP1 2
G1 SP2 2
G1 SP2 20
G1 SP2 1
G2 SP3 30
G2 SP3 9
G2 SP3 3
G3 SP3 2
and I would like to add a new_group
column for each Groups-Names
where I found consecutive Values < 10
I should then get:
Groups Names Values new_groups
G1 SP1 1 NG1
G1 SP1 5 NG1
G1 SP1 -2 NG1
G1 SP1 30 NG2
G1 SP1 50 NG3
G1 SP1 50 NG4
G1 SP1 -1 NG5
G1 SP1 2 NG5
G1 SP2 2 NG5
G1 SP2 20 NG6
G1 SP2 1 NG7
G2 SP3 30 NG8
G2 SP3 9 NG9
G2 SP3 3 NG9
G3 SP3 2 NG10
so for instance, since I get Values < 10 for the first 3 rows, I assign the first group: NG1
Then, I have a value > 10 (which is 30), so I assign the second group: NG2
Then, I get value > 10
in row5
, then I assign a new group : NG3
Then, I get again a value > 10
in row6
, then I assign a new group: NG4
and so on...
Here is the dataframe in dict format if it can help;
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G1', 7: 'G1', 8: 'G1', 9: 'G1', 10: 'G1', 11: 'G2', 12: 'G2', 13: 'G2',14:'G3'}, 'Names': {0: 'SP1', 1: 'SP1', 2: 'SP1', 3: 'SP1', 4: 'SP1', 5: 'SP1', 6: 'SP1', 7: 'SP1', 8: 'SP2', 9: 'SP2', 10: 'SP2', 11: 'SP3', 12: 'SP3', 13: 'SP3', 14 : 'SP3'}, 'Values': {0: 1, 1: 5, 2: -2, 3: 30, 4: 50, 5: 50, 6: -1, 7: 2, 8: 2, 9: 20, 10: 1, 11: 30, 12: 9, 13: 3, 14: 2}}
CodePudding user response:
I can't find any better way to do it than just using the python function and then map it with pandas. This is not a very efficient way, but this'll do the job!
#import
import pandas as pd
# Global Variable to know wether prv. one was <10
var = False
# Var. to hold prv. Grp no.
prv_grp = 0
# Function
def func(val):
# Acessing the Variables, global as it's outside the func. scope
global var
global prv_grp
if var: # If prv. val was <10
if val < 10: # If it is still <10
return "NG" str(prv_grp) # Returning the value
else: # If not
var = False # To remember that this is not <10
prv_grp = 1 # Increasing the grp
return "NG" str(prv_grp) # Returning the value
else: # If prv. value was not <10
if val < 10: # But it is now
var = True #To remember that this is <10
prv_grp = 1 # Increasing the grp
return "NG" str(prv_grp) # Returning the value
# Your data
x = {
'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G1', 7: 'G1', 8: 'G1', 9: 'G1', 10: 'G1', 11: 'G2', 12: 'G2', 13: 'G2'},
'Names': {0: 'SP1', 1: 'SP1', 2: 'SP1', 3: 'SP1', 4: 'SP1', 5: 'SP1', 6: 'SP1', 7: 'SP1', 8: 'SP2', 9: 'SP2', 10: 'SP2', 11: 'SP3', 12: 'SP3', 13: 'SP3'},
'Values': {0: 1, 1: 5, 2: -2, 3: 30, 4: 50, 5: 50, 6: -1, 7: 2, 8: 2, 9: 20, 10: 1, 11: 30, 12: 9, 13: 3}
}
# Converting to dataframe
df = pd.DataFrame(x)
# Mapping the new_group column with an output of func. taking the Values column as input
df['new_groups'] = df['Values'].map(func)
Output:
print(df)
Groups Names Values new_groups
0 G1 SP1 1 NG1
1 G1 SP1 5 NG1
2 G1 SP1 -2 NG1
3 G1 SP1 30 NG2
4 G1 SP1 50 NG3
5 G1 SP1 50 NG4
6 G1 SP1 -1 NG5
7 G1 SP1 2 NG5
8 G1 SP2 2 NG5
9 G1 SP2 20 NG6
10 G1 SP2 1 NG7
11 G2 SP3 30 NG8
12 G2 SP3 9 NG9
13 G2 SP3 3 NG9
EDIT: Added Argument from Groups col. per op request from comment.
# For understanding the code refer above code comments!
import pandas as pd
var = False
prv_grp = 0
grp_name = "" # For storing prv. or current group name
def func(grp, val):
global grp_name
global var
global prv_grp
if grp == grp_name: # if group hasn't changed
if var:
if val < 10:
return "NG" str(prv_grp)
else:
var = False
prv_grp = 1
return "NG" str(prv_grp)
else:
if val < 10:
var = True
prv_grp = 1
return "NG" str(prv_grp)
else: # If group name had changed
grp_name = grp # Stroing the new Group name
if val < 10:
var = True
prv_grp = 1
else:
prv_grp = 1
var = False
return "NG" str(prv_grp)
x = {
'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G1', 7: 'G1', 8: 'G1', 9: 'G1', 10: 'G1', 11: 'G2', 12: 'G2', 13: 'G2', 14: 'G3'},
'Names': {0: 'SP1', 1: 'SP1', 2: 'SP1', 3: 'SP1', 4: 'SP1', 5: 'SP1', 6: 'SP1', 7: 'SP1', 8: 'SP2', 9: 'SP2', 10: 'SP2', 11: 'SP3', 12: 'SP3', 13: 'SP3', 14: 'SP3'},
'Values': {0: 1, 1: 5, 2: -2, 3: 30, 4: 50, 5: 50, 6: -1, 7: 2, 8: 2, 9: 20, 10: 1, 11: 30, 12: 9, 13: 3, 14: 2}
}
df = pd.DataFrame(x)
#df['new_groups'] = df['Values'].map(func)
# Here we are using lambda that send the value from Values to val arg & Groups to grp arg in func()
df['new_groups'] = df.apply(lambda x: func(grp = x.Groups, val = x.Values), axis=1)
Output:
print(df)
Groups Names Values new_groups
0 G1 SP1 1 NG1
1 G1 SP1 5 NG1
2 G1 SP1 -2 NG1
3 G1 SP1 30 NG2
4 G1 SP1 50 NG3
5 G1 SP1 50 NG4
6 G1 SP1 -1 NG5
7 G1 SP1 2 NG5
8 G1 SP2 2 NG5
9 G1 SP2 20 NG6
10 G1 SP2 1 NG7
11 G2 SP3 30 NG8
12 G2 SP3 9 NG9
13 G2 SP3 3 NG9
14 G3 SP3 2 NG10
Peace!
CodePudding user response:
Relatively short approach basing on accessing the previous values of Values
column (using pandas.DataFrame.shift
) and compound boolean mask.
True
values of new_column
indicate positions where group index must be increased by 1, False
- those positions that fall into previous group.
In [222]: df['new_group'] = ((df.Values > 10) | df.Values.shift().isna() | (df.Values.shift() > 10))
In [223]: group_idx = {'idx': 0} # group indexer
In [224]: def set_groups(val, idx):
...: if val:
...: idx['idx'] = 1
...: return 'NG{}'.format(idx['idx'])
...:
In [225]: df['new_group'] = df['new_group'].apply(set_groups, idx=group_idx)
In [226]: df
Out[226]:
Groups Names Values new_group
0 G1 SP1 1 NG1
1 G1 SP1 5 NG1
2 G1 SP1 -2 NG1
3 G1 SP1 30 NG2
4 G1 SP1 50 NG3
5 G1 SP1 50 NG4
6 G1 SP1 -1 NG5
7 G1 SP1 2 NG5
8 G1 SP2 2 NG5
9 G1 SP2 20 NG6
10 G1 SP2 1 NG7
11 G2 SP3 30 NG8
12 G2 SP3 9 NG9
13 G2 SP3 3 NG9
14 G3 SP3 2 NG9