Udoo Benchmarks

Udoo Benchmarks

Introduction

Each test was run on an Udoo with a Quad core while X.org was running and a Chromium window was open. The additional load, for the most part, is not important because most these benchamarks only test one of the cores of the udoo at a time. Only the 7zip test ran accross all cores.

To compare the benchmarks to those of the Raspberry Pi (wiki), I used this package (zip). Also compare to my Angstrom BeagleBone Black test .

Dhrystone (no compiler optimization)

At 1,048,252 dhrystones per core, each core with the unoptimized compile is about as fast as a Raspberry Pi with an optimized compile.

$ gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -o dhry
$ ./dhry ########################################## Dhrystone Benchmark, Version 2.1 (Language: C or C++) Optimisation Opt 3 32 Bit Register option not selected 10000 runs 0.03 seconds 100000 runs 0.15 seconds 200000 runs 0.20 seconds 400000 runs 0.38 seconds 800000 runs 0.76 seconds 1600000 runs 1.51 seconds 3200000 runs 3.05 seconds Final values (* implementation-dependent): Int_Glob: O.K. 5 Bool_Glob: O.K. 1 Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 3200010 Ptr_Glob-> Ptr_Comp: * 98680 Discr: O.K. 0 Enum_Comp: O.K. 2 Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: * 98680 same as above Discr: O.K. 0 Enum_Comp: O.K. 1 Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13 Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1 Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND STRING From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Nanoseconds one Dhrystone run: 953.97 Dhrystones per Second: 1048252 VAX MIPS rating = 596.61

 

Dhrystone (O3 compiler optimization)

With over 2,809,990 dhrystones per second, each core of the Udoo quad is essentially three Raspberry Pi's. I thought maybe it was using the extra cores, and ran the test while using top to see that it is, in fact, only using 100% CPU (not 300%+). The BeagleBone Black with Ubuntu ran at 3,319,960 dhrystones per second, so it is roughly as fast as that.

gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -O3 -o dhry
ubuntu@imx6-qsdl:~/Downloads/Raspberry_Pi_Benchmarks/Source Code$ ./dhry
##########################################

Dhrystone Benchmark, Version 2.1 (Language: C or C++)

Optimisation    Opt 3 32 Bit
Register option not selected

       10000 runs   0.01 seconds 
      100000 runs   0.11 seconds 
      200000 runs   0.08 seconds 
      400000 runs   0.14 seconds 
      800000 runs   0.29 seconds 
     1600000 runs   0.57 seconds 
     3200000 runs   1.15 seconds 
     6400000 runs   2.28 seconds 

Final values (* implementation-dependent):

Int_Glob:      O.K.  5  Bool_Glob:     O.K.  1
Ch_1_Glob:     O.K.  A  Ch_2_Glob:     O.K.  B
Arr_1_Glob[8]: O.K.  7  Arr_2_Glob8/7: O.K.     6400010
Ptr_Glob->              Ptr_Comp:       *    94584
  Discr:       O.K.  0  Enum_Comp:     O.K.  2
  Int_Comp:    O.K.  17 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->         Ptr_Comp:       *    94584 same as above
  Discr:       O.K.  0  Enum_Comp:     O.K.  1
  Int_Comp:    O.K.  18 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:     O.K.  5  Int_2_Loc:     O.K.  13
Int_3_Loc:     O.K.  7  Enum_Loc:      O.K.  1  
Str_1_Loc:                             O.K.  DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:                             O.K.  DHRYSTONE PROGRAM, 2'ND STRING


From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Nanoseconds one Dhrystone run:       355.87
Dhrystones per Second:              2809990
VAX  MIPS rating =                  1599.31
 

Linpack (no compiler optimization)

$ gcc linpack.c cpuidc.c -lpthread -lrt -o linpack
$ ./linpack ########################################## Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++' Optimisation Opt 3 32 Bit norm resid resid machep x[0]-1 x[n-1]-1 1.7 7.41628980e-14 2.22044605e-16 -1.49880108e-14 -1.89848137e-14 Times are reported for matrices of order 100 1 pass times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.02323 0.00169 0.02492 27.55 0.0726 0.4451 Calculating matgen overhead 10 times 0.04 seconds 100 times 0.17 seconds 200 times 0.26 seconds 400 times 0.51 seconds 800 times 1.00 seconds Overhead for 1 matgen 0.00125 seconds Calculating matgen/dgefa passes for 1 seconds 10 times 0.12 seconds 20 times 0.22 seconds 40 times 0.43 seconds 80 times 0.86 seconds 160 times 1.70 seconds Passes used 93 Times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.00961 0.00030 0.00991 69.32 0.0289 0.1769 0.01040 0.00034 0.01074 63.91 0.0313 0.1919 0.01037 0.00032 0.01069 64.24 0.0311 0.1909 0.01002 0.00031 0.01033 66.50 0.0301 0.1844 0.00975 0.00052 0.01026 66.92 0.0299 0.1832 Average 66.18 Calculating matgen2 overhead Overhead for 1 matgen 0.00135 seconds Times for array with leading dimension of 200 dgefa dgesl total Mflops unit ratio 0.00944 0.00031 0.00975 70.39 0.0284 0.1742 0.00997 0.00034 0.01031 66.61 0.0300 0.1841 0.00989 0.00030 0.01019 67.41 0.0297 0.1819 0.00970 0.00030 0.01000 68.67 0.0291 0.1786 0.00980 0.00033 0.01013 67.80 0.0295 0.1809 Average 68.18 ########################################## From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Unrolled Double Precision 66.18 Mflops

 

Linpack (O3 compiler optimization)

gcc linpack.c cpuidc.c -lpthread -lrt -O3 -o linpack
./linpack 

##########################################
Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++'

Optimisation Opt 3 32 Bit

norm resid      resid           machep         x[0]-1          x[n-1]-1
   1.7    7.41628980e-14   2.22044605e-16  -1.49880108e-14  -1.89848137e-14

Times are reported for matrices of order          100
1 pass times for array with leading dimension of  201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.01608    0.00050    0.01658      41.41     0.0483     0.2961

Calculating matgen overhead
        10 times   0.02 seconds
       100 times   0.10 seconds
       200 times   0.09 seconds
      2000 times   0.78 seconds
      4000 times   1.68 seconds
Overhead for 1 matgen      0.00042 seconds

Calculating matgen/dgefa passes for 1 seconds
        10 times   0.06 seconds
       100 times   0.48 seconds
       200 times   0.94 seconds
       400 times   1.87 seconds
Passes used        213 

Times for array with leading dimension of 201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00427    0.00017    0.00444     154.65     0.0129     0.0793
    0.00430    0.00017    0.00448     153.36     0.0130     0.0800
    0.00420    0.00017    0.00438     156.94     0.0127     0.0781
    0.00417    0.00018    0.00435     158.01     0.0127     0.0776
    0.00417    0.00018    0.00435     157.88     0.0127     0.0777
Average                               156.17

Calculating matgen2 overhead
Overhead for 1 matgen      0.00038 seconds

Times for array with leading dimension of 200

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00392    0.00017    0.00409     168.04     0.0119     0.0730
    0.00395    0.00017    0.00412     166.69     0.0120     0.0736
    0.00392    0.00017    0.00408     168.12     0.0119     0.0729
    0.00394    0.00017    0.00411     167.12     0.0120     0.0734
    0.00395    0.00017    0.00412     166.63     0.0120     0.0736
Average                               167.32

##########################################

From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Unrolled Double  Precision      156.17 Mflops 

 

7zip  with Chromium Running

This is the only test that uses all the cores of the Udoo. It also never seems to complete. I suspect it is running out of ram - mostly due to X running. At the time of the test only 219MB was free. And no swap was available by - which is the defualt configuration. As one would expect, since each core is roughly as fast as a BeagleBone Black, the total speed for the Udoo for both compressing and decompressing is about four times as fast as the BeagleBone Black.

$ 7z b

7-Zip 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)

RAM size:     621 MB,  # CPU hardware threads:   4
RAM usage:    434 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1216   254    465   1183  |    30151   362    752   2720
23:    1164   248    478   1186  |    30379   368    754   2780
Killed
 

7zip without Chromium running

This time, it did not crash at the same place. It is worth noting that it is a little faster now that Chromium is not using a little CPU. Also, it slows down as it got to the test where it crashed - probably due to the need to free some RAM.

$ 7z b

7-Zip 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)

RAM size:     621 MB,  # CPU hardware threads:   4
RAM usage:    434 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1207   260    451   1174  |    31465   379    749   2839
23:    1215   267    464   1237  |    30984   378    750   2835
24:    1148   264    468   1234  |    30059   374    745   2788
----------------------------------------------------------------
Avr:          263    461   1215               377    748   2821
Tot:          320    604   2018
 
 

OpenSSL

$ openssl speed
OpenSSL 1.0.0e 6 Sep 2011 built on: Wed Oct 5 01:45:02 UTC 2011 options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 0.00 0.00 0.00 0.00 0.00 mdc2 0.00 0.00 0.00 0.00 0.00 md4 7761.79k 26911.62k 75214.93k 135954.32k 178506.41k md5 5767.57k 19209.05k 50070.10k 81085.73k 101173.93k hmac(md5) 6381.72k 21288.94k 53009.49k 84140.37k 101087.64k sha1 5666.52k 16888.08k 37025.76k 52359.00k 58607.51k rmd160 5257.78k 14787.15k 30770.00k 42600.47k 48376.64k rc4 62741.19k 68218.54k 70962.26k 70447.09k 70686.04k des cbc 15914.89k 16795.56k 17455.62k 17476.95k 17304.57k des ede3 6203.65k 6306.68k 6338.39k 6370.38k 6292.45k idea cbc 0.00 0.00 0.00 0.00 0.00 seed cbc 17832.68k 19928.66k 19854.59k 19216.84k 18967.42k rc2 cbc 10840.32k 11225.97k 11317.79k 11660.63k 11661.06k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 24167.36k 26446.88k 27283.46k 27385.97k 27343.57k cast cbc 21879.96k 23476.25k 23563.43k 23602.86k 22880.26k aes-128 cbc 16743.73k 17882.61k 18444.54k 18474.12k 18795.02k aes-192 cbc 14343.86k 15459.49k 15857.10k 15878.14k 15483.75k aes-256 cbc 12639.34k 13298.28k 13605.58k 13706.53k 13602.02k camellia-128 cbc 22846.73k 24616.23k 25983.91k 26494.98k 25978.78k camellia-192 cbc 18236.58k 20065.58k 20567.55k 20692.31k 20728.49k camellia-256 cbc 18566.37k 20146.07k 20573.10k 20700.84k 20499.11k sha256 3800.08k 8691.37k 15218.29k 18623.56k 20044.37k sha512 914.51k 3706.11k 5185.64k 7120.61k 7908.75k whirlpool 1377.65k 2809.03k 4500.46k 5302.20k 5497.99k aes-128 ige 16060.83k 17553.19k 17875.99k 17982.54k 17846.46k aes-192 ige 13971.60k 15051.37k 15428.15k 15460.35k 15398.23k aes-256 ige 12274.63k 13041.07k 13216.33k 13419.59k 13413.03k sign verify sign/s verify/s rsa 512 bits 0.002148s 0.000174s 465.5 5738.0 rsa 1024 bits 0.010869s 0.000507s 92.0 1971.4 rsa 2048 bits 0.062687s 0.001641s 16.0 609.3 rsa 4096 bits 0.402000s 0.005541s 2.5 180.5 sign verify sign/s verify/s dsa 512 bits 0.001732s 0.001985s 577.5 503.8 dsa 1024 bits 0.004978s 0.005795s 200.9 172.6 dsa 2048 bits 0.015997s 0.019006s 62.5 52.6 sign verify sign/s verify/s 160 bit ecdsa (secp160r1) 0.0009s 0.0040s 1111.7 250.9 192 bit ecdsa (nistp192) 0.0010s 0.0046s 971.7 215.2 224 bit ecdsa (nistp224) 0.0014s 0.0064s 739.7 156.3 256 bit ecdsa (nistp256) 0.0018s 0.0091s 566.4 109.9 384 bit ecdsa (nistp384) 0.0039s 0.0207s 257.0 48.4 521 bit ecdsa (nistp521) 0.0088s 0.0454s 114.0 22.0 163 bit ecdsa (nistk163) 0.0036s 0.0082s 277.7 122.6 233 bit ecdsa (nistk233) 0.0069s 0.0158s 144.4 63.3 283 bit ecdsa (nistk283) 0.0105s 0.0296s 95.0 33.8 409 bit ecdsa (nistk409) 0.0242s 0.0691s 41.3 14.5 571 bit ecdsa (nistk571) 0.0599s 0.1616s 16.7 6.2 163 bit ecdsa (nistb163) 0.0037s 0.0090s 271.5 111.3 233 bit ecdsa (nistb233) 0.0073s 0.0182s 136.5 54.9 283 bit ecdsa (nistb283) 0.0110s 0.0343s 91.0 29.2 409 bit ecdsa (nistb409) 0.0255s 0.0815s 39.3 12.3 571 bit ecdsa (nistb571) 0.0622s 0.1911s 16.1 5.2 op op/s 160 bit ecdh (secp160r1) 0.0034s 290.2 192 bit ecdh (nistp192) 0.0040s 252.3 224 bit ecdh (nistp224) 0.0055s 183.3 256 bit ecdh (nistp256) 0.0078s 127.6 384 bit ecdh (nistp384) 0.0171s 58.6 521 bit ecdh (nistp521) 0.0377s 26.5 163 bit ecdh (nistk163) 0.0040s 251.8 233 bit ecdh (nistk233) 0.0078s 128.5 283 bit ecdh (nistk283) 0.0146s 68.6 409 bit ecdh (nistk409) 0.0364s 27.4 571 bit ecdh (nistk571) 0.0851s 11.8 163 bit ecdh (nistb163) 0.0046s 217.4 233 bit ecdh (nistb233) 0.0091s 109.6 283 bit ecdh (nistb283) 0.0166s 60.4 409 bit ecdh (nistb409) 0.0415s 24.1 571 bit ecdh (nistb571) 0.0970s 10.3
 
Evan Boldt Thu, 01/16/2014 - 12:40