(Paper #80)
Trace-caching was initially proposed to increase instruction bandwidth at low latency. Recently it was suggested that a trace cache could be inte-grated into a hardware-based optimization frame-work. Traces can be optimized by transforming long traces into ones that execute faster or con-sume less power. The potential benefit of such trace optimization depends primarily on the selection of long traces with high coverage of dynamic execution. This paper performs a comprehensive investigation of dynamic trace selection. It introduces a classification of trace selection methods and discusses existing and novel dynamic selection approaches - including loop unrolling, procedure in-lining and incremental merging of traces based on dynamic bias. The paper empirically analyzes a number of selection schemes in an idealized framework. Observations based on the SPEC-CPU2000 benchmarks show that: (a) selection based on dy-namic bias is necessary to achieve the best per-formance across all benchmarks, (b) the best selection scheme is benchmark and maximum trace-length specific, (c) simple selection, based on pro-gram structure information only, is sufficient to achieve the best performance for several bench-marks. Consequently, two alternatives for the trace selection mechanism are established: (a) a "best per-formance" approach relying on complex dynamic criteria; (b) a "value" approach that provides the best performance (and potentially the best power consumption) based on simpler static criteria. Another emerging alternative advocates adaptive based mechanisms to adjust selection criteria.
Keywords:
Architecture