Home > Back-end >  How can I create a sequence of repeating numbers based on another column in python?
How can I create a sequence of repeating numbers based on another column in python?

Time:10-23

I have this dataset

Country Date ValueA
USA 2020-02-01 325
USA 2020-02-02 444
USA 2020-02-03 125
USA 2020-02-04 456
USA 2020-02-05 325
USA 2020-02-06 465
USA 2020-02-07 548
USA 2020-02-08 696
FRA 2020-02-01 357
FRA 2020-02-02 123
FRA 2020-02-03 125
FRA 2020-02-04 987
FRA 2020-02-05 343
FRA 2020-02-06 874
FRA 2020-02-07 658
FRA 2020-02-08 413

I want to create a sequence for each country to later group by country and every 4 days. I'm looking for something like this

Country Date ValueA every_4
USA 2020-02-01 325 1
USA 2020-02-02 444 1
USA 2020-02-03 125 1
USA 2020-02-04 456 1
USA 2020-02-05 325 2
USA 2020-02-06 465 2
USA 2020-02-07 548 2
USA 2020-02-08 696 2
FRA 2020-02-01 357 1
FRA 2020-02-02 123 1
FRA 2020-02-03 125 1
FRA 2020-02-04 987 1
FRA 2020-02-05 343 2
FRA 2020-02-06 874 2
FRA 2020-02-07 658 2
FRA 2020-02-08 413 2

In R i belive it would be something like

df = df %>% groupby(Country) %>% mutate(Every_4 = rep(seq(),4))

Could someone please help? Thanks a lot!!

CodePudding user response:

Use:

df['every_4'] = df.groupby('Country').transform(lambda x: x.groupby(x.index//4).ngroup() 1)['ValueA']

Output:

   Country        Date  ValueA  every_4
0      USA  2020-02-01     325        1
1      USA  2020-02-02     444        1
2      USA  2020-02-03     125        1
3      USA  2020-02-04     456        1
4      USA  2020-02-05     325        2
5      USA  2020-02-06     465        2
6      USA  2020-02-07     548        2
7      USA  2020-02-08     696        2
8      FRA  2020-02-01     357        1
9      FRA  2020-02-02     123        1
10     FRA  2020-02-03     125        1
11     FRA  2020-02-04     987        1
12     FRA  2020-02-05     343        2
13     FRA  2020-02-06     874        2
14     FRA  2020-02-07     658        2
15     FRA  2020-02-08     413        2

CodePudding user response:

Use groupby_cumcount:

df['every_4'] = df.groupby('Country').cumcount().floordiv(4).add(1)
print(df)

# Output:
   Country        Date  ValueA  every_4
0      USA  2020-02-01     325        1
1      USA  2020-02-02     444        1
2      USA  2020-02-03     125        1
3      USA  2020-02-04     456        1
4      USA  2020-02-05     325        2
5      USA  2020-02-06     465        2
6      USA  2020-02-07     548        2
7      USA  2020-02-08     696        2
8      FRA  2020-02-01     357        1
9      FRA  2020-02-02     123        1
10     FRA  2020-02-03     125        1
11     FRA  2020-02-04     987        1
12     FRA  2020-02-05     343        2
13     FRA  2020-02-06     874        2
14     FRA  2020-02-07     658        2
15     FRA  2020-02-08     413        2
  • Related