Swel: Hardware Cache Coherence Protocols Essay

SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches |
12/15/2013 |

Shared Memory Multi processors require cache coherence in order to keep cached values updated while performing operations. Snooping and directory based protocols are two well known standards of cache coherence. However both of them possess some problems. Snooping protocol is not scalable and is only suitable for systems

The bus in SMP is therefore replaced with scalable network. Snooping protocol shows poor performance in network based SMP. Whenever a write operation is performed by any processor, a snoop is sent to all caches to invalidate or update the shared block of data. This can increase communication overhead when there are too many sharers. This problem was resolved in directory based protocol. In this protocol every cache looks up the directory of blocks with status bits. Status bits keep track of all sharers and their corresponding status of block. Snoop is sent only when a shared block is required for read. This outperforms the drawback of snooping protocol. However directory based protocol has some disadvantages. Those disadvantages are:
i. Directory storage
Each block in L2 cache maintains a directory of sharers. For each sharer a bit is reserved in the block hence the storage increases linearly with number of sharers.
ii. Indirection
Multiple messages are exchanged through network before the coherence operation is deemed to complete. For example, when a processor performs write, the directory first sends invalidation to the sharers and the write operations is only performed after acknowledgement is received from all sharers.
iii. Complexity
Directory-based coherence conventions are frequently error-inclined and whole research communities are handling their efficient outline with formal verication.
Many of the applications running on today's multi-core machines are still single-threaded applications that do not explicitly rely on cache coherence. Further, future many-cores in servers and datacenters will likely execute multiple VMs (each possibly executing a multi-programmed workload), with no data sharing between VMs, again removing the need for cache coherence.
Moreover, several machines share data through message passing hence no cache coherence protocol is needed.
Figure 1 indicates that very little data is actually shared by two or more cores; on average 77.0% of all memory locations are touched by only a single processor.

Based on above arguments, we claim that the percentage of processing required for shared memory multi-threaded execution that actually needs cache coherence will be much less than that utilized by traditional hardware cache coherent multiprocessors.
Proposed Solution (SWEL)
The proposed solution is based on the premise that most blocks are either private to a core or read-only, and hence, do not require coherence. The basic claim is this: (i) many blocks do not need coherence and can be freely placed in L1 caches; (ii) blocks that would need coherence if placed in L1 are only placed in L2. Given this claim, it appears that the coherence protocol is all but eliminated.
This is only partially true as other book-keeping is now required to identify which of the above two categories a block falls into. If a cache block is either private or is read-only, then that block can be safely...

