GTech Booster • The Lost Art of Storage Mapping Function

continuation of part 2

Hashing – Just Another SMF

If you are still unimpressed by the idea of an SMF then perhaps my last example will please you. The whole idea of an SMF can be generalised to include a function that maps some elements into the same storage location.

This may seem like a crazy idea but you might have come across it before under the names “scatter storage” or “hash functions”. Whatever you call it, it has to be one of the nicest ideas in the whole of computing.

The principle of an SMF is that given one data value, the key, you can find the location of an associated data value using f(key).

All of the SMFs we have looked at so far have been very regular. They make use of the regularity of the data to map it to a one-after the other sequential storage – but sometimes this isn’t necessary.

Suppose you can find a function, any old function, f(key) that gives you a location for all possible values of the key and in most but not all cases gives you different locations for different keys – then why not use it?

For example, suppose you want to store words in an array. You could use the SMF given by adding together the ASCII codes of the first two letters minus 128.

For example, f(CAT) would be 67+65-128 (ASCII codes of C and A minus 128) or 4. This means that you could store CAT in location 4. In the same way f(DOG) is 68+79-128 = 19 and so DOG would be stored in location 19.

This works just as well as a regular SMF but with one problem sometimes two different keys will be mapped to the same location. For example as only the first two letters are used f(CAR) is the same as f(CAT) and we would attempt to store the two at the same place.

This is called a collision and different scatter storage or hashing schemes deal with the problem in different ways. The easiest thing to do is to check to see if the location given by f(key) has been used and if it has check f(key)+1, f(key)+2 and so on until you find a free location. There are lots of variations on collision management but this linear search is simple and fairly efficient.

If you don’t see the point of hashing functions then try the problem of storing and subsequently finding names in an array. Without hash functions you either have to perform a sequential search of the array or sort the array and perform a quadratic (binary) search. The former is inefficient and the latter complex and has the overhead of a complete sort. A hashing function gives you the location of any word in one evaluation and even if there is a collision you should find the word after a short linear search.

There is a lot more to be said about hash functions but the main thing is that you see them as nothing more than slightly odd SMFs.

You can think of hashing as using a chaotic or pseudo random SMF.

SMFs Hidden Rather Than Forgotten

So it looks as if Storage Mapping Functions are alive and well after all. Perhaps the sub-title of this article should have been the “hidden” rather than “lost” art.

Credit: I-Programmer, Harry Fairhead