Home > Back-end >  Problems using generated large random integers as ids in Rails?
Problems using generated large random integers as ids in Rails?

Time:11-10

In a Rails app, for business reasons, I don't want to leak how many objects I have or the difference between objects count in a period of time.

I think global unique ids are an overkill in this case, as I just need local unique ids for each table. As ids in a default Rails Potgresql app are already bigints, I thought about generating very large (but not astronomically large) integers to use as ids:

class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class

  before_create :set_large_random_id

  private

  def set_large_random_id
    self.id = rand(1..999_999_999_999)
  end
end

(I am OK my app failing because of a unique id collision once a trillionth).

Apart loosing the ordering given by sequential ids, is there any other problem or consideration I need to take into account by using large random ids?

CodePudding user response:

You can use UUID in your database. I am using it and it is quite good with randomized large ID. You can look into google for more info or look into this also: https://pawelurbanek.com/uuid-order-rails

You can check the ID.

 Role.first
  Role Load (0.4ms)  SELECT "roles".* FROM "roles" ORDER BY "roles"."id" ASC LIMIT $1  [["LIMIT", 1]]
 => #<Role:0x00007f5834fa8910 
id: "0d8acf6e-63d4-4845-a804-c84e8debb501", 
name: "business_analyst", 
created_at: Tue, 08 Nov 2022 10:46:47.321147000  06  06:00, updated_at: Tue, 08 Nov 2022 10:46:47.321147000  06  06:00> 

CodePudding user response:

I can't think of any major issues given that you don't care about collisions. But here are some small considerations:

Developer ergonomics:

If all your other tables have a normal incrementing id column, as they usually do in Rails projects, consistency may be nice. Otherwise this table will be the one exception where internal people looking at data have to order by created_at instead of by id. Debugging also becomes more difficult when you can't easily query the next and previous items.

The solution here would be to have a separate unique column with an index, and that one is exposed to the users. This could be some sort of random integer, or integers separated by hyphens like Amazon order ids.

Style:

If one user's id is 5 and another user's id is 999,999,999,999 then your UI will have to make both sizes look nice. It will also be noticeably odd to the user who has two objects with wildly different ids.

At a company I worked for we had our receipt ids look more like BGTM-ZIJL-52 (always 4/4/2) for consistency. (We also had to make sure we never generated rude words.)

Imported data:

Let's say you are storing order numbers from people's purchases, and you merge with another company and want to import their orders into your system. It would be easier to do that if you had a separate but flexible column for public ids (i.e. a varchar), so that it will support the other company's ids no matter what style they chose (unless there are collisions with yours).


Asides: You're probably aware of this, but you can of course retry inserting a row if you get a collision. There's no need for your app to fail. You could also have PostgreSQL generate the random-seeming ids: https://stackoverflow.com/a/20890246

  • Related