22 February 2013

Cache usage - Why do I have to care?

Last fall I held a internal course at my company with the title Coding with performance and quality in mind. In this post I will write about one part in that course, Cache usage, and I will try to answer the questions why do you have to care?

Some of you might say:
- Hey, you! This is nothing new. If I Google “memory cache performance” I get 55.800.000 hits.”
 Why another post?

Well, I will not write another post with lot of details. There are already lot of great post with it already (Google is your friend). I am writing this post for you who are a software developer but don’t know some much about hardware.



The big conclusion


You want the right data inside your cache at the right time!

- That sounds good!

But what is a cache?

In your computer, smartphone or some other device, you have main memory, normally called RAM, and you have cache which is a superfast memory. Some parts of the cache is located in the same chip as the CPU so it located really close to the CPU and makes the access quick (se figure below).




Details:  Cache can be divided in L1, L2 and L3 but also TLB, I - Cache and D- Cache (check the links in the end for geek info.

- Now I know what it is, but

How is it used?

Data that the CPU needs now or very soon is read from the main memory (or a cache further away) into the cache so it can be accessed really quickly. It takes longer time if the CPU needs to go all the way over the data bus and copy it every time before the CPU can use it. It might need it again very soon and therefore it is temporary stored in the cache.

But there are limitations! The cache is limited in size. For instance my Samsung Galaxy SIII has one L1 cache for data usage one for each core. Another limitation is that it can only copy chunks of data, cache lines, which typically has a size of 32, 64 and 128 bytes.

- So whats the problem? Just copy what you need!

Before the CPU starts copy data from the main memory to the cache, it first checks if it’s already exists in the cache. If it doesn’t, it is called cache miss and the CPU needs to work more, which you don’t wan’t.

- Ok, I understand what you mean. It’s good to have some understanding what limitations the hardware have, but

How do I minimize cache misses?

You organize your data after usage! Imagine you have a data struct ,as the one below, and you what to search for some persons id. The size of of one cache line is limited to 32 bytes.

struct person
{
 unsigned int id;
 char data[40];
 struct person *pNext;
}
...

while (ptr->id != magicId)
{
 ptr = ptr->pNext;
}


When the CPU copy one line, it will contain the id and a part of the first person due the the size of the cache line. If the first persons id doesn’t match, it will continue to search by updating the ptr pointer. The problem is that this information is not inside the cache. The CPU needs to copy that information from the main memory into the cache. This is a typically cache miss. A better solution would be to move the most common used data close to each other as below

struct person
{
 unsigned int id;
 struct person *pNext;
 char data[40];
}


- So you are saying that I shall re-organize all my data structs now?

No, first you analyse your code in a profiler and find your hot spots. Cache misses might not be your biggest problem but it’s good to understand if one of your loops turns to to be a hot spot.


No comments:

Post a Comment