Home > Blockchain >  How can I reorder one column in a way that the same serie of dates is repeated?
How can I reorder one column in a way that the same serie of dates is repeated?

Time:04-22

Here's the thing.

I'm building a dataframe so my students can use it in an exercise. This is what I was looking for:

id    date        n
0     2022-01-01  10
0     2022-01-02  30
0     2022-01-03  40
.
.
.
1     2022-01-01  0
1     2022-01-02  5
1     2022-01-03  16
.
.
.
2     2022-01-01  99
2     2022-02-01  100
2     2022-03-01  3300

It's a pretty simple idea. I have a panel data in which the first column is the customer id, the second one is the purchase date and n is the amount of things purchased. However, I couldn't get the second column in a fine order.

This is the farthest I got:

ids = pd.Series(range(100)).repeat(10).reset_index(drop=True)

dates = pd.date_range(start = pd.Timestamp('2022-01-01'),end = pd.Timestamp('2022-01-10')).repeat(100)

purchases = pd.Series(np.random.randint(low=0, high=500, size=100*10))

df = pd.DataFrame(data={'ids':ids, 'dates':dates, 'purchases': purchases}, index=None)

As you can see, I couldn't get it right. This is the result:

ids   dates        n
0     2022-01-01  10
0     2022-01-01  30
0     2022-01-01  40
.
.
.

Any ideas on how can I change the column dates in order to start with 2022-01-01, but in the second row we need to get 2022-01-02. So I get the correct order once and repeat.

This looks like something really easy, but I couldn't find the answer.

CodePudding user response:

This'll do what it seems like you want:

start_date = pd.Timestamp("2022-01-01")

purchase_low = 0
purchase_high = 500

num_ids = 100
num_days = 10

ids = np.repeat(np.arange(num_ids), num_days)
dates = np.tile(pd.date_range("2022-01-01", periods=num_days), num_ids)
purchases = np.random.randint(purchase_low, purchase_high, num_ids * num_days)

df = pd.DataFrame({"ids": ids, "dates": dates, "purchases": purchases})

Output:

     ids      dates  purchases
0      0 2022-01-01        193
1      0 2022-01-02        362
2      0 2022-01-03        218
3      0 2022-01-04        395
4      0 2022-01-05        172
..   ...        ...        ...
995   99 2022-01-06        454
996   99 2022-01-07        342
997   99 2022-01-08        356
998   99 2022-01-09        423
999   99 2022-01-10        104

[1000 rows x 3 columns]

The issue with your approach was that .repeat() repeats each element n times, whereas what you needed to do with the dates was repeat each sequence of dates n times. Hence why I use .repeat for ids, but .tile for dates.

  • Related