Sunday, 21 December 2014

High Resolution performance timer

I recently stumbled across this issue because I was trying to compile some code on an ARM based computer. There was code in the program I wanted to compile that uses assembly! I am not going to get into the details over which method might be faster or has higher resolution. From what I have learned this is the most compact and portable code to use if you want a high resolution counter that can be used for something like performance profiling.


This was the original code that was causing the issue. This code depends on the x86 instruction set.

#ifdef _WIN32
   unsigned long long tick;
   QueryPerformanceCounter((LARGE_INTEGER *)&tick); // works great on Windows ONLY
   return tick;
#else
 uint32_t hi, lo;
   __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); // Works well on x86 only
   return ( (uint64_t)lo)|( (uint64_t)hi)<< 32 );
#endif



Thanks to improvements on the POSIX based
int clock_gettime(clockid_t clk_id, struct timespec *tp);
 
We can replace our not portable assembly code for our easy to use clock_gettime code like so



#ifdef _WIN32
   unsigned long long tick;
   QueryPerformanceCounter((LARGE_INTEGER *)&tick); // works great on Windows ONLY
   return tick;
#else
   timespec timeInfo;
   clock_gettime(CLOCK_MONOTONIC_RAW, &timeInfo); // nanosecond resolution
   unsigned long long int nanosecs = ((unsigned long long)timeInfo.tv_sec)*1000000000  + 
                       ((unsigned long long)timeInfo.tv_nsec);
   return nanosecs;
#endif

Best of luck.

References:
  1. http://man7.org/linux/man-pages/man2/clock_gettime.2.html
  2. http://tdistler.com/2010/06/27/high-performance-timing-on-linux-windows
  3. http://en.wikipedia.org/wiki/High_Precision_Event_Timer