Fri, 26 Jun 2009

Consistent hashing and memcached, all you should know

Consistent hashing is a hash discipline to avoid some errors in traditional hashing - like modulus. Some memcached clients uses these techniques, perhaps libketama.

With modulus hashing - key % n_servers - you may have problems when some server goes down. If you change the constant n_servers all of your keys will be mapped to other servers, this behavior can has disastrous consequences perhaps a lot of misses in a small fraction of time.

Consistent hashing is trying to "address" this situation. Libketama implementation create one vector of integers where each integer is assigned to unique host. To matching some key to this vector you need convert this key in one integer and look up the most close integer host.

vector = [2 -> host a , 10 -> host b, 32 -> host a, 54 -> host b]
hash_key = f_hash("foo") <-- 7 perhaps
host = lookup(vector, hash_key) <-- host b
Libketama use a md5 checksum to hash hosts and keys and 2^32 integer representation. Look up function is a traditional dichotomic search and has O(log n) cost where n is a length of vector. To reach a good uniform distribution libketama is using a 160 points per server or positions in vector per host.

whats happen when one host goes down ?

When one server goes down libketama create new vector without the host is down, and the keys mapped to old host will be distributed to all other host holding the keys of other hosts.

vector = [2 -> host a , 32 -> host a]
hash_key = f_hash("foo") <-- 7  perhaps
host = lookup(vector, hash_key) <-- host a
Weighted feature in libketama

Libketama also implement a new feature called weighted algorithm. This feature will help to assign K variable points regarded the percentage of memory of amount of memory of all servers. How many points more probability to reach one host.

/* 
   host a has 2 Gb = 50% of keys
   host b has 1 Gb = 25% of keys
   host c has 1 GB = 25% of keys
   */
vector = [2 -> host a , 10 -> host b, 16 -> host c, 32 -> host a]
hash_key = f_hash("bar") <-- 7 perhaps
host = lookup(vector, hash_key) <-- host b

hash_key = f_hash("foo") <-- 12 perhaps
host = lookup(vector, hash_key) <-- host c
One server goes down with weighted feature enabled

Weighted feature has some problems, when one server goes down the percentages assigned to each host will change and the number of points also, therefore the property described in non weighted algorithm here may fail.

/* 
   host a has 2 Gb = 66% of keys
   host c has 1 Gb = 33% of keys
   */
vector = [2 -> host a , 14 -> host a, 16 -> host c, 32 -> host a]
hash_key = f_hash("bar") <-- 7 perhaps
host = lookup(vector, hash_key) <-- host a

hash_key = f_hash("foo") <-- 12 perhaps
host = lookup(vector, hash_key) <-- host a
With weighted feature enabled new points are created per host and old keys mapped to up server current can be mapped to other servers.

Whats happen when you add a new servers to list in weighted or not weighted libketama implementation

Like to weighted problems when you add new host to list a new points will be created in a vector and old keys mapped into one host current can be mapped to this new host.

Posted at: 19:15 | category: /memcached | Comments (0)


Name:


E-mail:


URL:


Comment: