(Paper #150)
In general, the hardware memory consistency model in a multiprocessor system is not identical to the memory model at the programming language level. Consequently, the programming language memory model must be mapped onto the hardware memory model. Memory fence instructions can be inserted by the compiler where needed to accomplish this mapping. We have developed and implemented fence insertion and optimization algorithms in our Pensieve compiler project. We present different fence insertion optimization techniques that were used in this system to guarantee sequential consistency at the language level, and compare them using preliminary performance data. Our techniques target two hardware relaxed memory consistency models provided by an SMP based on IBM PowerIII and Intel Pentium 4. Our fence insertion optimization shows 17.2% and 37.2% performance improvement on average, with the IBM PowerPC and Intel Pentium 4 respectively.
Keywords:
Compilers
Architecture