Home > Blockchain >  how can I place a new value after 3 data points on each sub group?
how can I place a new value after 3 data points on each sub group?

Time:12-10

I have this data frame :

d = {'col1': [1, 2,0,55,12,1, 3,1,56,13], 'col2': [3,4,44,34,46,2,3,43,35,47], 'col3': ['A','A','A','B','B','A','B','B','B','B'] } 
df = pd.DataFrame(data=d)
df

  col1  col2    col3
0   1   3       A
1   2   4       A
2   0   44      A
3   55  34      B
4   12  46      B
5   1   2       A
6   3   3       B
7   1   43      B
8   56  35      B
9   13  47      B

The goal here is to have a data frame looking like this :

    df
  col1  col2    col3   label
0   1   3       A      Nan   
1   2   4       A      Nan
2   0   44      A      1
3   55  34      B      Nan 
4   12  46      B      Nan 
5   1   2       A      Nan
6   3   3       B      Nan
7   1   43      B      Nan 
8   56  35      B      1
9   13  47      B      Nan

we get the column label by :

1- calculating the number of the occurrence of characters on col3, i do this as follow :

s = df['col3'].ne(df['col3'].shift()).cumsum()

df['count'] = s.map(s.value_counts())

so I get this :

col1    col2    col3    count
0   1   3          A    3
1   2   4          A    3
2   0   44         A    3
3   55  34         B    2
4   12  46         B    2
5   1   2          A    1
6   3   3          B    4
7   1   43         B    4
8   56  35         B    4
9   13  47         B    4

the target is : I would like to create a new column label where i should iterate on the data frame column count, and when I find its value >= 3, the 3rd row of that 'sub group' in our case the : AAA then BB, A and finally BBBB should receive 1 as we have this :

df
  col1  col2    col3   label
0   1   3       A      Nan   
1   2   4       A      Nan
2   0   44      A      1
3   55  34      B      Nan 
4   12  46      B      Nan 
5   1   2       A      Nan
6   3   3       B      Nan
7   1   43      B      Nan 
8   56  35      B      1
9   13  47      B      Nan

CodePudding user response:

I feel like you need cumcount

df.loc[s.groupby(s).cumcount()==2,'new']=1
df
Out[235]: 
   col1  col2 col3  new
0     1     3    A  NaN
1     2     4    A  NaN
2     0    44    A  1.0
3    55    34    B  NaN
4    12    46    B  NaN
5     1     2    A  NaN
6     3     3    B  NaN
7     1    43    B  NaN
8    56    35    B  1.0
9    13    47    B  NaN
  • Related