Naming Things

An old computer joke says:

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

Caching and indexing should be avoided if possible thus rendering cache invalidation and off-by-one errors irrelevant, but the problem of naming things is here to stay for at least as long as computers will deal with data without really understanding it. The crux of the naming problem is that by naming something you make it an entity that is distinct from other entities. But different people have different ideas what does or does not constitutes an entity, so you immediately run into problem of different entities going by the same name or the same entity known by different names.

Ideally, you derive the name for the entity entirely from its properties in an unique and idempotent way. Git commit hash is a perfect example of such approach and any git user can testify that referring to individual commits in git is really well implemented. Unfortunately we are rarely so lucky. Mostly we deal with concepts residing in our minds that each of us understands a bit differently or objects in our environment that each of us perceives in his own special way. Once we try to model these entities and give them names, each of us will come with a bit different idea for a name. One alternative is to disregard properties of the object entirely and name them with (consecutive) numbers. This is the approach pioneered by librarians and the one usually used in computer systems. It works great while you have a central authority assigning numbers, but fails miserably in any distributed system.

Enhanced by Zemanta