Dubbed Denver, the processor will be used in the firm’s first 64-bit version of its Tigra K1 mobile processor designed for Android smartphones and tablets.
This new version of Tegra K1 pairs our 192-core Kepler architecture-based GPU with the 64-bit, dual-core CPU.
Each of the Denver cores implements a 7-way superscalar microarchitecture (up to 7 concurrent micro-ops can be executed per clock), and includes a 128kbyte 4-way L1 instruction cache, a 64kbyte 4-way L1 data cache, and a 2Mbyte 16-way L2 cache, which services both cores.
The chip uses dynamic code optimisation of frequently used software routines at runtime. These microcode-equivalent routines are stored in a dedicated, 128Mbyte main-memory-based optimisation cache.
After being read into the instruction cache, the optimised micro-ops are executed, re-fetched and executed from the instruction cache as long as needed and capacity allows.
The means the CPU looks across a window of hundreds of instructions and unrolls loops, renames registers, removes unused instructions, and reorders the code in various ways for optimal speed.
According to Nvidia, this can double the performance of the base-level hardware through the conversion of ARM code to optimised microcode routines.
For more detail: Nvidia goes own way with 64-bit ARM CPU