Home > database >  How to make a new index column if id is being reset to 1 everyday and it has to be connected with ot
How to make a new index column if id is being reset to 1 everyday and it has to be connected with ot

Time:01-04

I have a data that looks like this

tx_id tx_date product_id
1 1/4/2023 3
1 1/4/2023 5
2 1/4/2023 3
3 1/4/2023 1
4 1/4/2023 1
4 1/4/2023 5
1 1/5/2023 5
1 1/5/2023 1
2 1/5/2023 4
2 1/5/2023 6
3 1/5/2023 1

Quick explanation for the columns
tx_id is a transaction id
tx_date is a transaction date
product_id is a product id

The behavior of the data here is that the tx_id being reset to 1 if the day changed

I was expecting to update the data to be like this by using pyspark / pandas
Take a quick look at the new id number 4 and 5

id tx_id tx_date product_id
1 1 1/4/2023 3
1 1 1/4/2023 5
2 2 1/4/2023 3
3 3 1/4/2023 1
4 4 1/4/2023 1
4 4 1/4/2023 5
5 1 1/5/2023 5
5 1 1/5/2023 1
6 2 1/5/2023 4
6 2 1/5/2023 6
7 3 1/5/2023 1

A quick explanation of what I was expecting is how do I make a new id column with index based on the tx_id, tx_date, and product_id

CodePudding user response:

If uniqueness of the index is your main concern, resetting the entire dataframe index can be helpful.

df=df.reset_index(drop=True)

CodePudding user response:

This would work (in pyspark):

df.withColumn("new_id", F.dense_rank().over(Window.orderBy(col("tx_date"), col("tx_id")))).show()

Output

  • Related