Udoo Benchmarks
Udoo BenchmarksIntroduction
Each test was run on an Udoo with a Quad core while X.org was running and a Chromium window was open. The additional load, for the most part, is not important because most these benchamarks only test one of the cores of the udoo at a time. Only the 7zip test ran accross all cores.
To compare the benchmarks to those of the Raspberry Pi (wiki), I used this package (zip). Also compare to my Angstrom BeagleBone Black test .
Dhrystone (no compiler optimization)
At 1,048,252 dhrystones per core, each core with the unoptimized compile is about as fast as a Raspberry Pi with an optimized compile.
$ gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -o dhry
$ ./dhry ########################################## Dhrystone Benchmark, Version 2.1 (Language: C or C++) Optimisation Opt 3 32 Bit Register option not selected 10000 runs 0.03 seconds 100000 runs 0.15 seconds 200000 runs 0.20 seconds 400000 runs 0.38 seconds 800000 runs 0.76 seconds 1600000 runs 1.51 seconds 3200000 runs 3.05 seconds Final values (* implementation-dependent): Int_Glob: O.K. 5 Bool_Glob: O.K. 1 Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 3200010 Ptr_Glob-> Ptr_Comp: * 98680 Discr: O.K. 0 Enum_Comp: O.K. 2 Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: * 98680 same as above Discr: O.K. 0 Enum_Comp: O.K. 1 Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13 Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1 Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND STRING From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Nanoseconds one Dhrystone run: 953.97 Dhrystones per Second: 1048252 VAX MIPS rating = 596.61
Dhrystone (O3 compiler optimization)
With over 2,809,990 dhrystones per second, each core of the Udoo quad is essentially three Raspberry Pi's. I thought maybe it was using the extra cores, and ran the test while using top to see that it is, in fact, only using 100% CPU (not 300%+). The BeagleBone Black with Ubuntu ran at 3,319,960 dhrystones per second, so it is roughly as fast as that.
gcc dhry_1.c dhry_2.c dhry.h cpuidc.c -lpthread -lrt -O3 -o dhry ubuntu@imx6-qsdl:~/Downloads/Raspberry_Pi_Benchmarks/Source Code$ ./dhry ########################################## Dhrystone Benchmark, Version 2.1 (Language: C or C++) Optimisation Opt 3 32 Bit Register option not selected 10000 runs 0.01 seconds 100000 runs 0.11 seconds 200000 runs 0.08 seconds 400000 runs 0.14 seconds 800000 runs 0.29 seconds 1600000 runs 0.57 seconds 3200000 runs 1.15 seconds 6400000 runs 2.28 seconds Final values (* implementation-dependent): Int_Glob: O.K. 5 Bool_Glob: O.K. 1 Ch_1_Glob: O.K. A Ch_2_Glob: O.K. B Arr_1_Glob[8]: O.K. 7 Arr_2_Glob8/7: O.K. 6400010 Ptr_Glob-> Ptr_Comp: * 94584 Discr: O.K. 0 Enum_Comp: O.K. 2 Int_Comp: O.K. 17 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Next_Ptr_Glob-> Ptr_Comp: * 94584 same as above Discr: O.K. 0 Enum_Comp: O.K. 1 Int_Comp: O.K. 18 Str_Comp: O.K. DHRYSTONE PROGRAM, SOME STRING Int_1_Loc: O.K. 5 Int_2_Loc: O.K. 13 Int_3_Loc: O.K. 7 Enum_Loc: O.K. 1 Str_1_Loc: O.K. DHRYSTONE PROGRAM, 1'ST STRING Str_2_Loc: O.K. DHRYSTONE PROGRAM, 2'ND STRING From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Nanoseconds one Dhrystone run: 355.87 Dhrystones per Second: 2809990 VAX MIPS rating = 1599.31
Linpack (no compiler optimization)
$ gcc linpack.c cpuidc.c -lpthread -lrt -o linpack
$ ./linpack ########################################## Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++' Optimisation Opt 3 32 Bit norm resid resid machep x[0]-1 x[n-1]-1 1.7 7.41628980e-14 2.22044605e-16 -1.49880108e-14 -1.89848137e-14 Times are reported for matrices of order 100 1 pass times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.02323 0.00169 0.02492 27.55 0.0726 0.4451 Calculating matgen overhead 10 times 0.04 seconds 100 times 0.17 seconds 200 times 0.26 seconds 400 times 0.51 seconds 800 times 1.00 seconds Overhead for 1 matgen 0.00125 seconds Calculating matgen/dgefa passes for 1 seconds 10 times 0.12 seconds 20 times 0.22 seconds 40 times 0.43 seconds 80 times 0.86 seconds 160 times 1.70 seconds Passes used 93 Times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.00961 0.00030 0.00991 69.32 0.0289 0.1769 0.01040 0.00034 0.01074 63.91 0.0313 0.1919 0.01037 0.00032 0.01069 64.24 0.0311 0.1909 0.01002 0.00031 0.01033 66.50 0.0301 0.1844 0.00975 0.00052 0.01026 66.92 0.0299 0.1832 Average 66.18 Calculating matgen2 overhead Overhead for 1 matgen 0.00135 seconds Times for array with leading dimension of 200 dgefa dgesl total Mflops unit ratio 0.00944 0.00031 0.00975 70.39 0.0284 0.1742 0.00997 0.00034 0.01031 66.61 0.0300 0.1841 0.00989 0.00030 0.01019 67.41 0.0297 0.1819 0.00970 0.00030 0.01000 68.67 0.0291 0.1786 0.00980 0.00033 0.01013 67.80 0.0295 0.1809 Average 68.18 ########################################## From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Unrolled Double Precision 66.18 Mflops
Linpack (O3 compiler optimization)
gcc linpack.c cpuidc.c -lpthread -lrt -O3 -o linpack ./linpack ########################################## Unrolled Double Precision Linpack Benchmark - Linux Version in 'C/C++' Optimisation Opt 3 32 Bit norm resid resid machep x[0]-1 x[n-1]-1 1.7 7.41628980e-14 2.22044605e-16 -1.49880108e-14 -1.89848137e-14 Times are reported for matrices of order 100 1 pass times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.01608 0.00050 0.01658 41.41 0.0483 0.2961 Calculating matgen overhead 10 times 0.02 seconds 100 times 0.10 seconds 200 times 0.09 seconds 2000 times 0.78 seconds 4000 times 1.68 seconds Overhead for 1 matgen 0.00042 seconds Calculating matgen/dgefa passes for 1 seconds 10 times 0.06 seconds 100 times 0.48 seconds 200 times 0.94 seconds 400 times 1.87 seconds Passes used 213 Times for array with leading dimension of 201 dgefa dgesl total Mflops unit ratio 0.00427 0.00017 0.00444 154.65 0.0129 0.0793 0.00430 0.00017 0.00448 153.36 0.0130 0.0800 0.00420 0.00017 0.00438 156.94 0.0127 0.0781 0.00417 0.00018 0.00435 158.01 0.0127 0.0776 0.00417 0.00018 0.00435 157.88 0.0127 0.0777 Average 156.17 Calculating matgen2 overhead Overhead for 1 matgen 0.00038 seconds Times for array with leading dimension of 200 dgefa dgesl total Mflops unit ratio 0.00392 0.00017 0.00409 168.04 0.0119 0.0730 0.00395 0.00017 0.00412 166.69 0.0120 0.0736 0.00392 0.00017 0.00408 168.12 0.0119 0.0729 0.00394 0.00017 0.00411 167.12 0.0120 0.0734 0.00395 0.00017 0.00412 166.63 0.0120 0.0736 Average 167.32 ########################################## From File /proc/cpuinfo Processor : ARMv7 Processor rev 10 (v7l) processor : 0 BogoMIPS : 790.52 processor : 1 BogoMIPS : 790.52 processor : 2 BogoMIPS : 790.52 processor : 3 BogoMIPS : 790.52 Features : swp half thumb fastmult vfp edsp neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 CPU revision : 10 Hardware : SECO i.Mx6 UDOO Board Revision : 63012 Serial : 021111d4dbc7884d From File /proc/version Linux version 3.0.35 (udoo@ubuntu) (gcc version 4.4.4 (4.4.4_09.06.2010) ) #1 SMP PREEMPT Sat Oct 12 14:05:30 CEST 2013 Unrolled Double Precision 156.17 Mflops
7zip with Chromium Running
This is the only test that uses all the cores of the Udoo. It also never seems to complete. I suspect it is running out of ram - mostly due to X running. At the time of the test only 219MB was free. And no swap was available by - which is the defualt configuration. As one would expect, since each core is roughly as fast as a BeagleBone Black, the total speed for the Udoo for both compressing and decompressing is about four times as fast as the BeagleBone Black.
$ 7z b 7-Zip 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs) RAM size: 621 MB, # CPU hardware threads: 4 RAM usage: 434 MB, # Benchmark threads: 4 Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS 22: 1216 254 465 1183 | 30151 362 752 2720 23: 1164 248 478 1186 | 30379 368 754 2780 Killed
7zip without Chromium running
This time, it did not crash at the same place. It is worth noting that it is a little faster now that Chromium is not using a little CPU. Also, it slows down as it got to the test where it crashed - probably due to the need to free some RAM.
$ 7z b 7-Zip 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,4 CPUs) RAM size: 621 MB, # CPU hardware threads: 4 RAM usage: 434 MB, # Benchmark threads: 4 Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS 22: 1207 260 451 1174 | 31465 379 749 2839 23: 1215 267 464 1237 | 30984 378 750 2835 24: 1148 264 468 1234 | 30059 374 745 2788 ---------------------------------------------------------------- Avr: 263 461 1215 377 748 2821 Tot: 320 604 2018
OpenSSL
$ openssl speed
OpenSSL 1.0.0e 6 Sep 2011 built on: Wed Oct 5 01:45:02 UTC 2011 options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) compiler: cc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -DTERMIO -O2 -Wa,--noexecstack -g -Wall The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 0.00 0.00 0.00 0.00 0.00 mdc2 0.00 0.00 0.00 0.00 0.00 md4 7761.79k 26911.62k 75214.93k 135954.32k 178506.41k md5 5767.57k 19209.05k 50070.10k 81085.73k 101173.93k hmac(md5) 6381.72k 21288.94k 53009.49k 84140.37k 101087.64k sha1 5666.52k 16888.08k 37025.76k 52359.00k 58607.51k rmd160 5257.78k 14787.15k 30770.00k 42600.47k 48376.64k rc4 62741.19k 68218.54k 70962.26k 70447.09k 70686.04k des cbc 15914.89k 16795.56k 17455.62k 17476.95k 17304.57k des ede3 6203.65k 6306.68k 6338.39k 6370.38k 6292.45k idea cbc 0.00 0.00 0.00 0.00 0.00 seed cbc 17832.68k 19928.66k 19854.59k 19216.84k 18967.42k rc2 cbc 10840.32k 11225.97k 11317.79k 11660.63k 11661.06k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 24167.36k 26446.88k 27283.46k 27385.97k 27343.57k cast cbc 21879.96k 23476.25k 23563.43k 23602.86k 22880.26k aes-128 cbc 16743.73k 17882.61k 18444.54k 18474.12k 18795.02k aes-192 cbc 14343.86k 15459.49k 15857.10k 15878.14k 15483.75k aes-256 cbc 12639.34k 13298.28k 13605.58k 13706.53k 13602.02k camellia-128 cbc 22846.73k 24616.23k 25983.91k 26494.98k 25978.78k camellia-192 cbc 18236.58k 20065.58k 20567.55k 20692.31k 20728.49k camellia-256 cbc 18566.37k 20146.07k 20573.10k 20700.84k 20499.11k sha256 3800.08k 8691.37k 15218.29k 18623.56k 20044.37k sha512 914.51k 3706.11k 5185.64k 7120.61k 7908.75k whirlpool 1377.65k 2809.03k 4500.46k 5302.20k 5497.99k aes-128 ige 16060.83k 17553.19k 17875.99k 17982.54k 17846.46k aes-192 ige 13971.60k 15051.37k 15428.15k 15460.35k 15398.23k aes-256 ige 12274.63k 13041.07k 13216.33k 13419.59k 13413.03k sign verify sign/s verify/s rsa 512 bits 0.002148s 0.000174s 465.5 5738.0 rsa 1024 bits 0.010869s 0.000507s 92.0 1971.4 rsa 2048 bits 0.062687s 0.001641s 16.0 609.3 rsa 4096 bits 0.402000s 0.005541s 2.5 180.5 sign verify sign/s verify/s dsa 512 bits 0.001732s 0.001985s 577.5 503.8 dsa 1024 bits 0.004978s 0.005795s 200.9 172.6 dsa 2048 bits 0.015997s 0.019006s 62.5 52.6 sign verify sign/s verify/s 160 bit ecdsa (secp160r1) 0.0009s 0.0040s 1111.7 250.9 192 bit ecdsa (nistp192) 0.0010s 0.0046s 971.7 215.2 224 bit ecdsa (nistp224) 0.0014s 0.0064s 739.7 156.3 256 bit ecdsa (nistp256) 0.0018s 0.0091s 566.4 109.9 384 bit ecdsa (nistp384) 0.0039s 0.0207s 257.0 48.4 521 bit ecdsa (nistp521) 0.0088s 0.0454s 114.0 22.0 163 bit ecdsa (nistk163) 0.0036s 0.0082s 277.7 122.6 233 bit ecdsa (nistk233) 0.0069s 0.0158s 144.4 63.3 283 bit ecdsa (nistk283) 0.0105s 0.0296s 95.0 33.8 409 bit ecdsa (nistk409) 0.0242s 0.0691s 41.3 14.5 571 bit ecdsa (nistk571) 0.0599s 0.1616s 16.7 6.2 163 bit ecdsa (nistb163) 0.0037s 0.0090s 271.5 111.3 233 bit ecdsa (nistb233) 0.0073s 0.0182s 136.5 54.9 283 bit ecdsa (nistb283) 0.0110s 0.0343s 91.0 29.2 409 bit ecdsa (nistb409) 0.0255s 0.0815s 39.3 12.3 571 bit ecdsa (nistb571) 0.0622s 0.1911s 16.1 5.2 op op/s 160 bit ecdh (secp160r1) 0.0034s 290.2 192 bit ecdh (nistp192) 0.0040s 252.3 224 bit ecdh (nistp224) 0.0055s 183.3 256 bit ecdh (nistp256) 0.0078s 127.6 384 bit ecdh (nistp384) 0.0171s 58.6 521 bit ecdh (nistp521) 0.0377s 26.5 163 bit ecdh (nistk163) 0.0040s 251.8 233 bit ecdh (nistk233) 0.0078s 128.5 283 bit ecdh (nistk283) 0.0146s 68.6 409 bit ecdh (nistk409) 0.0364s 27.4 571 bit ecdh (nistk571) 0.0851s 11.8 163 bit ecdh (nistb163) 0.0046s 217.4 233 bit ecdh (nistb233) 0.0091s 109.6 283 bit ecdh (nistb283) 0.0166s 60.4 409 bit ecdh (nistb409) 0.0415s 24.1 571 bit ecdh (nistb571) 0.0970s 10.3