How to make a new index column if id is being reset to 1 everyday and it has to be connected with ot-CodePudding

I have a data that looks like this

Quick explanation for the columns
tx_id is a transaction id
tx_date is a transaction date
product_id is a product id

The behavior of the data here is that the tx_id being reset to 1 if the day changed

I was expecting to update the data to be like this by using pyspark / pandas
Take a quick look at the new id number 4 and 5

id	tx_id	tx_date	product_id
1	1	1/4/2023	3
1	1	1/4/2023	5
2	2	1/4/2023	3
3	3	1/4/2023	1
4	4	1/4/2023	1
4	4	1/4/2023	5
5	1	1/5/2023	5
5	1	1/5/2023	1
6	2	1/5/2023	4
6	2	1/5/2023	6
7	3	1/5/2023	1

A quick explanation of what I was expecting is how do I make a new id column with index based on the tx_id, tx_date, and product_id

CodePudding user response：

If uniqueness of the index is your main concern, resetting the entire dataframe index can be helpful.

df=df.reset_index(drop=True)

CodePudding user response：

This would work (in pyspark):

df.withColumn("new_id", F.dense_rank().over(Window.orderBy(col("tx_date"), col("tx_id")))).show()