Udoo Benchmarks

Introduction

Each test was run on an Udoo with a Quad core while X.org was running and a Chromium window was open. The additional load, for the most part, is not important because most these benchamarks only test one of the cores of the udoo at a time. Only the 7zip test ran accross all cores.

To compare the benchmarks to those of the Raspberry Pi (wiki), I used this package (zip). Also compare to my Angstrom BeagleBone Black test .

Dhrystone (no compiler optimization)

At 1,048,252 dhrystones per core, each core with the unoptimized compile is about as fast as a Raspberry Pi with an optimized compile.

$ gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -o dhry
$ ./dhry 
##########################################

Dhrystone Benchmark, Version 2.1 (Language: C or C++)

Optimisation    Opt 3 32 Bit
Register option not selected

       10000 runs   0.03 seconds 
      100000 runs   0.15 seconds 
      200000 runs   0.20 seconds 
      400000 runs   0.38 seconds 
      800000 runs   0.76 seconds 
     1600000 runs   1.51 seconds 
     3200000 runs   3.05 seconds 

Final values (* implementation-dependent):

Int_Glob:      O.K.  5  Bool_Glob:     O.K.  1
Ch_1_Glob:     O.K.  A  Ch_2_Glob:     O.K.  B
Arr_1_Glob[8]: O.K.  7  Arr_2_Glob8/7: O.K.     3200010
Ptr_Glob->              Ptr_Comp:       *    98680
  Discr:       O.K.  0  Enum_Comp:     O.K.  2
  Int_Comp:    O.K.  17 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->         Ptr_Comp:       *    98680 same as above
  Discr:       O.K.  0  Enum_Comp:     O.K.  1
  Int_Comp:    O.K.  18 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:     O.K.  5  Int_2_Loc:     O.K.  13
Int_3_Loc:     O.K.  7  Enum_Loc:      O.K.  1  
Str_1_Loc:                             O.K.  DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:                             O.K.  DHRYSTONE PROGRAM, 2'ND STRING


From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Nanoseconds one Dhrystone run:       953.97
Dhrystones per Second:              1048252
VAX  MIPS rating =                   596.61

Dhrystone (O3 compiler optimization)

With over 2,809,990 dhrystones per second, each core of the Udoo quad is essentially three Raspberry Pi's. I thought maybe it was using the extra cores, and ran the test while using top to see that it is, in fact, only using 100% CPU (not 300%+). The BeagleBone Black with Ubuntu ran at 3,319,960 dhrystones per second, so it is roughly as fast as that.

gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -O3 -o dhry
ubuntu@imx6-qsdl:~/Downloads/Raspberry_Pi_Benchmarks/Source Code$ ./dhry
##########################################

Dhrystone Benchmark, Version 2.1 (Language: C or C++)

Optimisation    Opt 3 32 Bit
Register option not selected

       10000 runs   0.01 seconds 
      100000 runs   0.11 seconds 
      200000 runs   0.08 seconds 
      400000 runs   0.14 seconds 
      800000 runs   0.29 seconds 
     1600000 runs   0.57 seconds 
     3200000 runs   1.15 seconds 
     6400000 runs   2.28 seconds 

Final values (* implementation-dependent):

Int_Glob:      O.K.  5  Bool_Glob:     O.K.  1
Ch_1_Glob:     O.K.  A  Ch_2_Glob:     O.K.  B
Arr_1_Glob[8]: O.K.  7  Arr_2_Glob8/7: O.K.     6400010
Ptr_Glob->              Ptr_Comp:       *    94584
  Discr:       O.K.  0  Enum_Comp:     O.K.  2
  Int_Comp:    O.K.  17 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Next_Ptr_Glob->         Ptr_Comp:       *    94584 same as above
  Discr:       O.K.  0  Enum_Comp:     O.K.  1
  Int_Comp:    O.K.  18 Str_Comp:      O.K.  DHRYSTONE PROGRAM, SOME STRING
Int_1_Loc:     O.K.  5  Int_2_Loc:     O.K.  13
Int_3_Loc:     O.K.  7  Enum_Loc:      O.K.  1  
Str_1_Loc:                             O.K.  DHRYSTONE PROGRAM, 1'ST STRING
Str_2_Loc:                             O.K.  DHRYSTONE PROGRAM, 2'ND STRING


From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Nanoseconds one Dhrystone run:       355.87
Dhrystones per Second:              2809990
VAX  MIPS rating =                  1599.31

Linpack (no compiler optimization)

$ gcc linpack.c cpuidc.c -lpthread -lrt -o linpack
$ ./linpack 

##########################################
Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++'

Optimisation Opt 3 32 Bit

norm resid      resid           machep         x[0]-1          x[n-1]-1
   1.7    7.41628980e-14   2.22044605e-16  -1.49880108e-14  -1.89848137e-14

Times are reported for matrices of order          100
1 pass times for array with leading dimension of  201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.02323    0.00169    0.02492      27.55     0.0726     0.4451

Calculating matgen overhead
        10 times   0.04 seconds
       100 times   0.17 seconds
       200 times   0.26 seconds
       400 times   0.51 seconds
       800 times   1.00 seconds
Overhead for 1 matgen      0.00125 seconds

Calculating matgen/dgefa passes for 1 seconds
        10 times   0.12 seconds
        20 times   0.22 seconds
        40 times   0.43 seconds
        80 times   0.86 seconds
       160 times   1.70 seconds
Passes used         93 

Times for array with leading dimension of 201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00961    0.00030    0.00991      69.32     0.0289     0.1769
    0.01040    0.00034    0.01074      63.91     0.0313     0.1919
    0.01037    0.00032    0.01069      64.24     0.0311     0.1909
    0.01002    0.00031    0.01033      66.50     0.0301     0.1844
    0.00975    0.00052    0.01026      66.92     0.0299     0.1832
Average                                66.18

Calculating matgen2 overhead
Overhead for 1 matgen      0.00135 seconds

Times for array with leading dimension of 200

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00944    0.00031    0.00975      70.39     0.0284     0.1742
    0.00997    0.00034    0.01031      66.61     0.0300     0.1841
    0.00989    0.00030    0.01019      67.41     0.0297     0.1819
    0.00970    0.00030    0.01000      68.67     0.0291     0.1786
    0.00980    0.00033    0.01013      67.80     0.0295     0.1809
Average                                68.18

##########################################

From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Unrolled Double  Precision       66.18 Mflops

Linpack (O3 compiler optimization)

gcc linpack.c cpuidc.c -lpthread -lrt -O3 -o linpack
./linpack 

##########################################
Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++'

Optimisation Opt 3 32 Bit

norm resid      resid           machep         x[0]-1          x[n-1]-1
   1.7    7.41628980e-14   2.22044605e-16  -1.49880108e-14  -1.89848137e-14

Times are reported for matrices of order          100
1 pass times for array with leading dimension of  201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.01608    0.00050    0.01658      41.41     0.0483     0.2961

Calculating matgen overhead
        10 times   0.02 seconds
       100 times   0.10 seconds
       200 times   0.09 seconds
      2000 times   0.78 seconds
      4000 times   1.68 seconds
Overhead for 1 matgen      0.00042 seconds

Calculating matgen/dgefa passes for 1 seconds
        10 times   0.06 seconds
       100 times   0.48 seconds
       200 times   0.94 seconds
       400 times   1.87 seconds
Passes used        213 

Times for array with leading dimension of 201

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00427    0.00017    0.00444     154.65     0.0129     0.0793
    0.00430    0.00017    0.00448     153.36     0.0130     0.0800
    0.00420    0.00017    0.00438     156.94     0.0127     0.0781
    0.00417    0.00018    0.00435     158.01     0.0127     0.0776
    0.00417    0.00018    0.00435     157.88     0.0127     0.0777
Average                               156.17

Calculating matgen2 overhead
Overhead for 1 matgen      0.00038 seconds

Times for array with leading dimension of 200

      dgefa      dgesl      total     Mflops       unit      ratio
    0.00392    0.00017    0.00409     168.04     0.0119     0.0730
    0.00395    0.00017    0.00412     166.69     0.0120     0.0736
    0.00392    0.00017    0.00408     168.12     0.0119     0.0729
    0.00394    0.00017    0.00411     167.12     0.0120     0.0734
    0.00395    0.00017    0.00412     166.63     0.0120     0.0736
Average                               167.32

##########################################

From File /proc/cpuinfo
Processor	: ARMv7 Processor rev 10 (v7l)
processor	: 0
BogoMIPS	: 790.52

processor	: 1
BogoMIPS	: 790.52

processor	: 2
BogoMIPS	: 790.52

processor	: 3
BogoMIPS	: 790.52

Features	: swp half thumb fastmult vfp edsp neon vfpv3 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0xc09
CPU revision	: 10

Hardware	: SECO i.Mx6 UDOO Board
Revision	: 63012
Serial		: 021111d4dbc7884d


From File /proc/version
Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013


Unrolled Double  Precision      156.17 Mflops

7zip with Chromium Running

This is the only test that uses all the cores of the Udoo. It also never seems to complete. I suspect it is running out of ram - mostly due to X running. At the time of the test only 219MB was free. And no swap was available by - which is the defualt configuration. As one would expect, since each core is roughly as fast as a BeagleBone Black, the total speed for the Udoo for both compressing and decompressing is about four times as fast as the BeagleBone Black.

$ 7z b

7-Zip 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)

RAM size:     621 MB,  # CPU hardware threads:   4
RAM usage:    434 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1216   254    465   1183  |    30151   362    752   2720
23:    1164   248    478   1186  |    30379   368    754   2780
Killed

7zip without Chromium running

This time, it did not crash at the same place. It is worth noting that it is a little faster now that Chromium is not using a little CPU. Also, it slows down as it got to the test where it crashed - probably due to the need to free some RAM.

$ 7z b

7-Zip 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs)

RAM size:     621 MB,  # CPU hardware threads:   4
RAM usage:    434 MB,  # Benchmark threads:      4

Dict        Compressing          |        Decompressing
      Speed Usage    R/U Rating  |    Speed Usage    R/U Rating
       KB/s     %   MIPS   MIPS  |     KB/s     %   MIPS   MIPS

22:    1207   260    451   1174  |    31465   379    749   2839
23:    1215   267    464   1237  |    30984   378    750   2835
24:    1148   264    468   1234  |    30059   374    745   2788
----------------------------------------------------------------
Avr:          263    461   1215               377    748   2821
Tot:          320    604   2018

OpenSSL

$ openssl speed
OpenSSL 1.0.0e 6 Sep 2011
built on: Wed Oct  5 01:45:02 UTC 2011
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md2                  0.00         0.00         0.00         0.00         0.00 
mdc2                 0.00         0.00         0.00         0.00         0.00 
md4               7761.79k    26911.62k    75214.93k   135954.32k   178506.41k
md5               5767.57k    19209.05k    50070.10k    81085.73k   101173.93k
hmac(md5)         6381.72k    21288.94k    53009.49k    84140.37k   101087.64k
sha1              5666.52k    16888.08k    37025.76k    52359.00k    58607.51k
rmd160            5257.78k    14787.15k    30770.00k    42600.47k    48376.64k
rc4              62741.19k    68218.54k    70962.26k    70447.09k    70686.04k
des cbc          15914.89k    16795.56k    17455.62k    17476.95k    17304.57k
des ede3          6203.65k     6306.68k     6338.39k     6370.38k     6292.45k
idea cbc             0.00         0.00         0.00         0.00         0.00 
seed cbc         17832.68k    19928.66k    19854.59k    19216.84k    18967.42k
rc2 cbc          10840.32k    11225.97k    11317.79k    11660.63k    11661.06k
rc5-32/12 cbc        0.00         0.00         0.00         0.00         0.00 
blowfish cbc     24167.36k    26446.88k    27283.46k    27385.97k    27343.57k
cast cbc         21879.96k    23476.25k    23563.43k    23602.86k    22880.26k
aes-128 cbc      16743.73k    17882.61k    18444.54k    18474.12k    18795.02k
aes-192 cbc      14343.86k    15459.49k    15857.10k    15878.14k    15483.75k
aes-256 cbc      12639.34k    13298.28k    13605.58k    13706.53k    13602.02k
camellia-128 cbc    22846.73k    24616.23k    25983.91k    26494.98k    25978.78k
camellia-192 cbc    18236.58k    20065.58k    20567.55k    20692.31k    20728.49k
camellia-256 cbc    18566.37k    20146.07k    20573.10k    20700.84k    20499.11k
sha256            3800.08k     8691.37k    15218.29k    18623.56k    20044.37k
sha512             914.51k     3706.11k     5185.64k     7120.61k     7908.75k
whirlpool         1377.65k     2809.03k     4500.46k     5302.20k     5497.99k
aes-128 ige      16060.83k    17553.19k    17875.99k    17982.54k    17846.46k
aes-192 ige      13971.60k    15051.37k    15428.15k    15460.35k    15398.23k
aes-256 ige      12274.63k    13041.07k    13216.33k    13419.59k    13413.03k
                  sign    verify    sign/s verify/s
rsa  512 bits 0.002148s 0.000174s    465.5   5738.0
rsa 1024 bits 0.010869s 0.000507s     92.0   1971.4
rsa 2048 bits 0.062687s 0.001641s     16.0    609.3
rsa 4096 bits 0.402000s 0.005541s      2.5    180.5
                  sign    verify    sign/s verify/s
dsa  512 bits 0.001732s 0.001985s    577.5    503.8
dsa 1024 bits 0.004978s 0.005795s    200.9    172.6
dsa 2048 bits 0.015997s 0.019006s     62.5     52.6
                              sign    verify    sign/s verify/s
 160 bit ecdsa (secp160r1)   0.0009s   0.0040s   1111.7    250.9
 192 bit ecdsa (nistp192)   0.0010s   0.0046s    971.7    215.2
 224 bit ecdsa (nistp224)   0.0014s   0.0064s    739.7    156.3
 256 bit ecdsa (nistp256)   0.0018s   0.0091s    566.4    109.9
 384 bit ecdsa (nistp384)   0.0039s   0.0207s    257.0     48.4
 521 bit ecdsa (nistp521)   0.0088s   0.0454s    114.0     22.0
 163 bit ecdsa (nistk163)   0.0036s   0.0082s    277.7    122.6
 233 bit ecdsa (nistk233)   0.0069s   0.0158s    144.4     63.3
 283 bit ecdsa (nistk283)   0.0105s   0.0296s     95.0     33.8
 409 bit ecdsa (nistk409)   0.0242s   0.0691s     41.3     14.5
 571 bit ecdsa (nistk571)   0.0599s   0.1616s     16.7      6.2
 163 bit ecdsa (nistb163)   0.0037s   0.0090s    271.5    111.3
 233 bit ecdsa (nistb233)   0.0073s   0.0182s    136.5     54.9
 283 bit ecdsa (nistb283)   0.0110s   0.0343s     91.0     29.2
 409 bit ecdsa (nistb409)   0.0255s   0.0815s     39.3     12.3
 571 bit ecdsa (nistb571)   0.0622s   0.1911s     16.1      5.2
                              op      op/s
 160 bit ecdh (secp160r1)   0.0034s    290.2
 192 bit ecdh (nistp192)   0.0040s    252.3
 224 bit ecdh (nistp224)   0.0055s    183.3
 256 bit ecdh (nistp256)   0.0078s    127.6
 384 bit ecdh (nistp384)   0.0171s     58.6
 521 bit ecdh (nistp521)   0.0377s     26.5
 163 bit ecdh (nistk163)   0.0040s    251.8
 233 bit ecdh (nistk233)   0.0078s    128.5
 283 bit ecdh (nistk283)   0.0146s     68.6
 409 bit ecdh (nistk409)   0.0364s     27.4
 571 bit ecdh (nistk571)   0.0851s     11.8
 163 bit ecdh (nistb163)   0.0046s    217.4
 233 bit ecdh (nistb233)   0.0091s    109.6
 283 bit ecdh (nistb283)   0.0166s     60.4
 409 bit ecdh (nistb409)   0.0415s     24.1
 571 bit ecdh (nistb571)   0.0970s     10.3

Evan Boldt Thu, 01/16/2014 - 12:40