regarding to HDFS
what is the meaning of dfs.replication.max ?
from doc - https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
its say only that - Maximal block replication
but still not understand this meaning
CodePudding user response:
Let's think through this. We have a min replication and this is typically 3.
Why have a max? Maybe you do a lot of maintenance and regularly take a node out of the cluster. You may end up by [taking nodes out] and [replacing nodes back in ] the cluster and it's reasonable to think 4 replicas of a block might happen with nodes leaving and returning. This might be a good situation due to your regular maintenance to have an extra copy hanging around so that maintenance doesn't always require lot of replication. You might accept 4 replicas as a max to replication. Taken to the extreme, this might get a little out of hand if you have 50 replicas of a file as this is just too much duplication and starts to eat into hdfs space. Think of the max as the time you might start to cull extra replicas.