A recent problem at work produced an interesting example of lockless concurrency.
Given a list of M files on S3, how should one allow a set of hosts H1..HN to move (copy + delete) these files so that no host Hi receives a file already downloaded to another host Hj?
Lacking an atomic native operation on S3 to this end, I first considered a scheme involving lock files on S3 itself, but that quickly proved to be worthless. I was motivated to find a lockless technique, which was fairly straightforward:
- Sort the set of hosts H1..HN and store this sorted list L on each host. assume that each host is identified in the list by a string representation of its ip address or fully-qualified domain.
- Each host Hi will know its order in L as i as well as the length of L as n.
- Each host Hi downloads the manifest M of files currently on S3.
- For each file file f in M, each host calculates the sum S of the ordinal values of each of the characters of the SHA1 hash of f. The SHA1 hash provides for a good distribution.
- The modulus S % n is calculated as v. if v == i, then the file should be copied to Hi and deleted from S3, otherwise this criteria will be met by another host Hj.
last update 2013-04-03