Home > database >  How do you recreate an existing HBASE table to add salted keys?
How do you recreate an existing HBASE table to add salted keys?

Time:01-01

I have an HBASE table with 100 of thousands of rows and we're experience issues with hotspotting.

I'd like to recreate this table with salted row keys.

I've attempted to "org.apache.hadoop.hbase.mapreduce.Import/CopyTable" into a new salted table, but it doesn't prefix the row keys with salt.

The only solution I've experienced that worked in migrating rows with prefix was a Phoenix query: UPSERT INTO TABLE_SALTED SELECT * FROM TABLE

However, this is VERY inefficient and takes way too long.

How do I salt an existing HBASE / Phoenix table with minimal downtime?

CodePudding user response:

Generally Hbase uses splitting to handle "hotspots".

That said you can manually split a table:

split '[table_to_split]', '[split point]'

This is more efficient as you are using the tools that comes with HBASE and doesn't require an entire re-write. It will only help you push the needle a little but sometimes that's enough to limp along.

They are a lot of settings you can play with to help things. Look into RegionSplitPolicy and see if you can find some help there.

If you want to look at a really good article on splitting this read this cloudera post.

I'm not sure how much intention you put into picking your splits but you really can't get better optimization than picking solid pre-split points to work around your data. (If you are salting it's likely you already discovered skew in your data and even reasonable intentions in picking splits doesn't handle skew. Well unless you already knew about skew.)

CodePudding user response:

If this hotspotting issue is caused by repeated reads why not try increasing file.block.cache.size & hbase_regionserver_heapsize.

'file.block.cache.size' - Portion of heap used for cache. hbase_regionserver_heapsize - Size of region server's heap.

You can just increase file.block.cache.size but you may then end up putting more pressure on heap.

The next obvious question is by how much? The answer is the same for all performance optimizations. Get an expert to try and calculate it, or just keep adding a little until you run out of space/you stop seeing improvement.

  • Related