I was reading a post from Dan Linstedt from last year on the temperature of data. This is a very interesting concept, and so intuitive I wondered why I had never heard it before.
You should read the whole of Temperature of Data for RDBMS, and DW 2.0, but here are Dan's definitions, summarized:
- Hot = accessed all the time, or extremely important data requiring sub-second response times
- Medium = is data accessed most of the time, but where response times can be anywhere from 1 second to 10 or 15 seconds
- Lukewarm Data = data that is accessed rarely; rarely may be (for example) once every 30 minutes or twice every 4 hours.
- Cold Data = historical context that is hardly ever accessed, but when requested, must have a response time equal to that of a couple minutes
What's really interesting about these ratings is that they can serve as a guide to proper data placement, and hence can inform how one should use data virtualization.
If you have data that's Hot, it should either be cached, or the application should hit the source directly. If you have data that is Medium, then it need not be cached, since query federation can easily hit these time frames. Likewise with Lukewarm data. Cold Data might even be put off to tape or cheap storage with a low query priority and a never-cache setting.
As Dan points out, definitions will vary by company (and maybe even within companies), but the framework can be universally applied. We'll continue to dig into this concept and show how it could work in practice. Think about it, and I'm sure you'll warm up to the idea.
Comments