According to IDC, by last year humans had generated over 1,200 Exabyte’s (1.2 Zetabytes) of data, that is 2^50 bytes – a mind boggling number, think about a stack of CD’s reaching to Jupiter and beyond.

Search engines exist to help us rapidly zero in on the needle in that 1200 exabyte haystack. However, search engines cannot help us find meaning as a result of multiple occurrences of the same or similar terms in all that data. This requirement is the realm of business intelligence, where the question being asked might be, “rank the most widely purchased product in my stores in the North East last month”

While the business intelligence question may seem straight forward enough, for it to work successfully, requires the data it is operating on to be normalized. Stated differently, the data must always refer to a can of baked beans as a can of baked beans. If it calls them just beans, or tinned baked beans or something similar the whole process breaks down, and the larger the dataset, the more variations the more of a problem it becomes.

This need for normalization cuts through the volume of information we are being bombarded with.  Good decisions are based on well-normalized data.

In IT, we continue to instrument and measure more. As each new item is measured, a new datum emerges upon which someone will want to base a decision. Unless that datum is normalized in the context of the decision being made, it will be useless. Here’s an example consider a software package; how many questions might you want to ask of a software package? What is it commonly called? Who publishers it? What is its license metric? What is its SKU? What are its secondary use rights? When does it go End-Of-Life? How many have I purchased?

The normalization task for these questions exists across multiple levels. The question of who publishes the software can be found within the software package itself, however if you need an analysis of the particular software publisher is in question, then Adobe, Adobe Inc, Adobe Systems will all need to be turned into a single publisher name.  Until ISO 19770-2 is widely adopted, the license metric is something that is maintained outside of the package in a catalog of some sort. The question of end of life date is likely to change frequently as the publisher adjusts its release schedule; witness the changing end of life date for XP as a result of the Vista fiasco. If you the try to answer the question of how many you have purchased, a different tool will provide the answer. What is the likelihood of this tool representing items such as publisher, name and license metric in the same way?

The challenges above related only to software, if you extend the question to other asset classes the normalization and enrichment challenge grows exponentially.

As you can see the continued investment in measurement tools, without a universal normalization method, is making the management of IT more difficult since it is more difficult to extract meaning from the management data without a consistent and normalized way to refer to the IT asset being measured across all measurement tools involved. Stated differently, too much data and not enough actionable intelligence.

The real challenge here is the interconnection;  while some solutions do a very good job at handling their own data, they don’t know about or care about how that data may be connected with other systems.

Scalable is solving some of these challenges today by developing a comprehensive end-to-end solution that can intelligently associate and normalize information on any IT related subject and dynamically enrich the data with related information from external sources. The solution, which we call Chaklun™ after a mythical all knowing wizard from ancient Ukraine, processes data uniformly from any data source and it is not constrained by a specific type of data object such as a software title name, vendor name, license metric, IP location, mac address mapping  etc.

Chaklun harnesses the power of the crowd, and aggregates anonymous normalization requests to create an extreme high-velocity update process. This continuous feedback loop identifies new objects rapidly and enriches the objects it already knows about, as new interconnections are required. For example, today we may just want to know whether a particular package is Windows 7 compatible, tomorrow we may want to know whether it has patch-based vulnerabilities. Chaklun supports the evolving normalization and enrichment needs IT.

With more interconnections and management requirements emerging everyday, getting intelligence from this interconnected data becomes more of a challenge. We think Chaklun can become an important platform to assist in this process.