Remark (placed here so it doesn't get lost): there is a restricted number ( 60000) of possible open connections to one port. Could that limit the throughput we are trying to achieve or is such a high number of connections unfeasible?
For simple performance comparisons several SMTP sinks have been implemented or tested.
Test programs are:
Test machines are:
Entries in the tables down below denote execution time in seconds unless otherwise noted, hence smaller values are better.
Tests have been performed with myslam (a multi-threaded SMTP client), using 7 to 8 client machines, 50 threads per client, and 5000 messages per client.
parameters | smtp-sink | smtps | thrperconn | thrpool |
1KB/msg (40MB) | 45s | 70s | 92s | 43s |
4KB/msg (160MB) | 49s | 56s | 259s | 78s |
32KB/msg (1280MB) | 203s | 208s | 999s | 110s |
-w 1 | 141s | 109s | 156s | 230s |
Note: v-sun is a four processor machine, hence the multi-threaded programs (thrpool, thrperconn) can use multiple processors. I didn't select (via an option) multiple processors for smtps though.
Just as one example, the achieved throughput in MB/s is listed in the next table. As it can be seen, it is an order of magnitude lower than the sustainable throughput that can be achieved over a single connection (about 85-90MB/s measured with ttcp; this is a 100Mbit/s ethernet).
parameters | smtp-sink | smtps | thrperconn | thrpool |
1KB/msg (40MB) | 0.9 | 0.6 | 0.4 | 0.9 |
4KB/msg (160MB) | 3.3 | 2.9 | 0.6 | 2.1 |
32KB/msg (1280MB) | 6.5 | 6.3 | - | 11.9 |
parameters | smtp-sink | smtps | thrperconn | thrpool |
1KB msg size | 97 | 87 | 380 | 140 |
4KB msg size | 108 | 130 | 1150 | 156 |
32KB msg size | 208 | 197 | fails | 330 |
-w 1 | 165 | 138 | 484 | 223 |
parameters | smtp-sink | smtps | thrperconn | thrpool |
1KB msg size | 38 | 28 | - | 31 |
4KB msg size | 34 | 33 | - | 31 |
32KB msg size | 125 | 125 | - | 125 |
-w 1 | 125 | 125 | - | 155 |
125 for 250/3 |
parameters | smtp-sink | smtps | thrperconn | thrpool |
1KB msg size | 45 | 44 | 165 | 74 |
4KB msg size | 54 | 45 | 418 | 75 |
32KB msg size | 217 | 167 | fails | 256 |
-w 1 | 370 | 360 | - | 337 |
2004-03-02
statethreads/examples/smtps3
See Section 5.2.1.1, machine 1
wiz$ time ./smtpc2 -fa@b.c -Rx@y.z -t 100 -s 1000 -r localhost
sink program | FS | times (s) |
smtps3 | - | 5 |
smtpss | UFS | 17, 18 |
smtps3 -C | UFS | 16, 17, 19 |
source: s-6.perf-lab
sink: v-bsd.perf-lab
with -C
s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab 19.17s real 1.08s user 0.64s system
without -C
s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab 3.04s real 0.81s user 0.59s system
source: s-6.perf-lab
sink: mon.perf-lab (FreeBSD 4.9)
with -C
12.05s real 1.04s user 0.67s system
without -C
3.03s real 0.92s user 0.54s system
2004-03-04 source: s-6.perf-lab; sink: v-sun.perf-lab
with -C: 20s - 24s (UFS) Note: It takes 20s(!) to remove all CDB files:
time rm ?/S* 0m20.11swith -C: 1s (TMPFS); 16s (UFS, /), rm: 14s; logging turned on: 16s, rm: 0.8s.
without -C: 1s
2004-03-08 source: s-6.perf-lab; sink: v-bsd;
./smtpc -t 100 -s 1000
sink program | time (s) |
smtpss | 30 |
smtps3 -C | 30 |
smtps3 | 3 |
2004-03-08 source: s-6.perf-lab; sink: v-sun;
./smtpc -t 100 -s 1000
sink program | FS | times (s) |
smtps3 | - | 1 |
smtpss | UFS | 25, 30 |
smtps3 -C | UFS | 23 |
smtpss | swap | 2, 3 |
smtps3 -C | swap | 1, 2 |
Note: the variance for smtpss on UFS is fairly large. The lower numbers are achieved by running smtps3 -C first and then smtpss, the larger numbers are measured when the CDB files have just been removed. However, this effect was not reproduceable. Note: removing those files takes about as long as a test run.
Test setup with a sendmail X prototype of 2002-09-04: v-aix.perf-lab running QMGR, SMTPS, and SMTPC. Relaying from localhost to v-bsd.perf-lab. Source program running on v-aix:
time ./smtp-source -s 50 -m 100 -c localhost:8000
Using the full version: 2.45s; turning fsync() off: 1.44s.
This clearly shows the need for a better CDB implementation, at least on AIX.
Same test with reversed roles (smX on v-bsd, sink on v-aix): using the full version: 7.44s; turning fsync() off: 6.20s. For comparison: using sendmail 8.12: 14.71s.
The SCSI disks on v-bsd seem to be fairly slow. Moreover, there seems to be something wrong with the OS version (it's very old: FreeBSD 3.4).
On FreeBSD 4.6 (machine 14, see Section 5.2.1.1) (source, sink, sm-9 of 2002-10-01 on the same machine):
time ./smtp-source -s 100 -m 200 -c localhost:8000
softupdates: 4.35s; without softupdates: 5.66s
time ./smtp-source -s 50 -m 100 -c localhost:8000
softupdates: 2.01s/1.93s, -U: 1.79s; without softupdates: 2.60s/2.46s, -U: 2.17s
(-U turns off fsync()).
Using sendmail 8.12.6:
time ./smtp-source -s 50 -m 100 localhost:1234
softupdates: 5.01s. This looks quite good for sendmail 8, but the result for:
time ./smtp-source -c -s 100 -m 200 localhost:1234
is: 143.12s, which certainly is not anywhere near good. This is related to the high load generated by this: up to 200 concurrent sendmail processes just kill the machine. sendmail X has only up to 4 processes running.
Test date: 2003-05-25, version: smX.0.0.6, machine: PC, AMD Duron 700MHz, 512MB RAM, SuSE 8.1
Test program:
time ./smtp-source -s 50 -m 500 -fa@b.c -tx@y.z localhost:1234
FS | Times | msg/s (best) |
JFS | 4.02s, 4.23s | 124 |
ReiserFS | 4.8s | 104 |
XFS | 6.7s, 7.2s, 7.48s, 7.64s | 74 |
EXT3 | 14.39s, 13.44s | 34 |
2004-03-17 checks/t-readwrite on destiny (Linux, IDE, ext2):
parameters | writes | time |
-s -f 1000 -p 1 | - | 9 |
-s -f 100 -p 10 | - | 6 |
The FS is mounted async (default!).
2004-03-17 checks/t-readwrite on ia64-2 (Linux, SCSI, reiserfs):
parameters | writes | time |
-s -f 1000 -p 1 | - | 5.2 |
-s -f 100 -p 10 | - | 2.6 |
2004-03-23 source: basil.ps-lab MTA: cilantro.ps-lab (Linux 2.4.18-64GB-SMP) sink: v-sun.perf-lab
FS: ReiserFS version 3.6.25
smtpc -t 100 -s 1000
program | source time | sink time |
smtps3 -C | - | |
smX.0.0.12 | 6 | 5 |
sm8.12.11 | 74 | 74 |
sm8.12.11 See 1 | 50 | |
postfix 2.0.18 |
gatling -m 100 -c 5000 -z 1 -Z 1
program | writes | source time | source msgs/s | sink time |
smtps3 | 2 | 2295 | - | |
smtps3 -C | 5 | 962 | - | |
smX.0.0.12 | 22 | 225 | 22 | |
sm8.12.11 | 358 | 14 | 358 | |
sm8.12.11 See 1 | 246 | 20 | - | |
postfix 2.0.18 |
Notes:
2004-03-25:
Filesystems:
smtpc -t 100 -s 1000
program | FS | source time | sink time |
smX.0.0.12 | 1 | 63 | 61 |
1 | 63 | 63 | |
2 | 19 | 18 | |
3 | 5 | 4 | |
3 | 5 | 5 | |
5 | 81 | 80 | |
sm8.12.11 | 3 | 45 | several read errors |
5 | 91 | 92 | |
smtps3 -C |
2004-03-25: gatling -m 100 -c 5000 -z 1 -Z 1 (1KB message size)
program | FS | source time | sink time | msgs/s |
smX.0.0.12 | 1 | |||
2 | 90 | 90 | 55 | |
3 | 24 | 24 | 208 | |
4 | 100 | 99 | 100 | |
sm8.12.11 | 3 | 216 | errors | 23 |
gatling -m 100 -c 5000 -z 4 -Z 4 (4KB message size)
program | FS | source time | sink time | msgs/s |
smX.0.0.12 | 1 | |||
2 | 92 | 92 | 54 | |
3 | 141 | 140 | 35 | |
4 | 168 | 168 | 29 | |
sm8.12.11 | 3 | 226 | errors | 22 |
gatling -m 100 -c 5000 -z 16 -Z 16 (16KB message size)
program | FS | source time | sink time | msgs/s |
smX.0.0.12 | 1 | |||
2 | ||||
3 | 169 | 29 | ||
4 | ||||
sm8.12.11 | 3 | 226 | errors | 22 |
Notes:
2003-11-19 sm-9.0.0.9 running on v-bsd.perf-lab (2 processors, FreeBSD 3.4)
Source on bsd.dev-lab
time ./smtp-source -d -s 100 -m 500
directly to sink: 2.16 - 2.74s (231msgs/s)
using MFS: 14.37 - 14.43s (34msgs/s) (sm8.12.10: 32s)
using FS with softupdates: 22.78 - 23.83s (21msgs/s) (sm8.12.10: 49s)
using FS without softupdates: 35.27 - 35.56s (14msgs/s)
2004-03-02 source: s-6.perf-lab; relay: mon; sink: v-bsd
time ./smtpc2 -O 10 -fa@s-6.perf-lab -Rnobody@v-bsd.perf-lab -t 100 -s 1000 -r mon.perf-lab:1234
38.26s real 1.01s user 0.88s system
2004-03-04 source: s-6.perf-lab; relay: v-bsd; sink: v-sun
options: -t 100 -s 1000
MTA | source time(s) | sink time |
postfix 2.0.18 | 53 | 94 |
smX.0.0.12 | 69 | 68 |
without smtpc | 56 | - |
sm8.12.11 | 67 | 67 |
-odq | 79, 82 | |
-odq / 100 qd | 101 | |
-odq / 10 qd | 100 |
Note: this is FreeBSD 3.4 without softupdates and directory hashes.
getrusage(2) data:
sm8.12.11 -odq
ru_utime= 15.0158488 ru_stime= 71.0104605 ru_maxrss= 1524 ru_ixrss= 5030592 ru_idrss= 4098456 ru_isrss= 1412096 ru_minflt= 127503 ru_majflt= 0 ru_nswap= 0 ru_inblock= 0 ru_oublock= 11851 ru_msgsnd= 13000 ru_msgrcv= 10000 ru_nsignals= 0 ru_nvcsw= 617469 ru_nivcsw= 18793
sm8.12.11
ru_utime= 15.0236311 ru_stime= 62.0117941 ru_maxrss= 1520 ru_ixrss= 4573224 ru_idrss= 3676784 ru_isrss= 1283712 ru_minflt= 174619 ru_majflt= 0 ru_nswap= 0 ru_inblock= 0 ru_oublock= 4001 ru_msgsnd= 12000 ru_msgrcv= 10000 ru_nsignals= 1000 ru_nvcsw= 128074 ru_nivcsw= 14771
This looks like a problem in queue only mode: there's way too much data written: almost 3 times the amount of background delivery mode. Why does sm8 send 1000 more message in queue only mode?
2004-03-05 source, relay, sink: wiz (FreeBSD 4.8)
options: -t 100 -s 1000
source: 34s, sink: 32s
turn off smtpc: source: 31s, 34s
2004-03-26 source: v-6.perf-lab running smtpc -t 100 -s 5000; relay: v-bsd.perf-lab; sink: v-sun.perf-lab
sink runs smtps2 -R n with varying values for n
n | source time | requests served |
0 | 108 | 5000 |
8000000 | 115 | 5060 |
58000000 | 140 | 5450 |
88000000 | 151 | 5620 |
put defedb on a RAM disk:
n | source time | requests served |
0 | 108 | 5000 |
8000000 | ||
58000000 | 111 | 5453 |
88000000 | 114 | 5693 |
Obviously the additional disk I/O traffic created by having to use DEFEDB is slowing down the system.
2004-06-23 Upgraded v-bsd.perf-lab to FreeBSD 4.9 (2 processors), using softupdates.
source on v-sun, sink on s-6:
time ./smtpc2 -O 10 -t 100 -s 1000 -r v-bsd.perf-lab:1234
43s
turn off fsync(): (smtps -U, must be compiled with -DTESTING)
32s
A modified iostat(8) program is used to show the number of bytes written and read, and the number of read, write, and other disk I/O operations.
The following tests were performed: sink (smtps3) on v-bsd.perf-lab, source (smtpc) on s-6.perf-lab sending 1000 mails. All numbers for write operations are rounded; if there are numbers in parentheses then those denote the value of ru_oublock (getrusage(2)) for smtps/qmgr or sm8. If two times are given (separated by /) then the second time denotes the output (elapsed time) for the sink.
program | softupdates? | writes | reads | time |
smtps3 -C | yes | 2200 | - | 14 |
smtps3 -C | no | 2900 | - | 30 |
smX.0.0.12, no sched (see 1) | yes | 5200 | - | 34 |
smX.0.0.12, no sched | yes | - | ||
smX.0.0.12, no sched | no | - | ||
smX.0.0.12 (see 2) | yes | 3500 (2000/1300) | 4 | 33 |
yes | 3370 (2020/1270) | 4 | 30/29 | |
-O i=1000000 | yes | 2660 (1850/660) | 0 | 25/24 |
smX.0.0.12 | no | 6300 (3000/3200) | 0 | 52 |
smX.0.0.12 (see 4) | yes | 3500 (2200/1200) | 4 | 25 |
sm8.12.11 -odq SS=m | yes | 1800 | - | 41 |
sm8.12.11 -odq SS=m | no | 12200 | - | 72 |
sm8.12.11 SS=m (see 3) | yes | 236 (164) | 0 | 61 |
yes | 370 (218) | 0 | 60 | |
sm8.12.11 | no | 8100 (4100) | 1 | 63 |
sm8.12.11 SS=t | yes | 7400 | 0 | 70 |
postfix 2.0.18 | yes | 2900 | 16 | 21/26 |
Notes:
2004-03-23 source: basil.ps-lab MTA: wasabi.ps-lab (FreeBSD 4.9, machine 16 in Section 5.2.1.1) sink: v-sun.perf-lab
smtpc -t 100 -s 1000
program | writes | reads | source time | sink time |
smtps3 -C | 2400 | - | 11 | - |
smX.0.0.12 | 2600 | 5 | 15 | 13 |
sm8.12.11 | 6000 | 1 | 35 | |
postfix 2.0.18 | 2800 | 15 | 14 | 20 |
Note: the sink time for postfix is shorter than the time for smX because smX emptied the queue during the run while postfix has more than 700 entries in the mail queue after the source finished sending all mails. This can be seen by looking at the sink time which is noticeable larger for postfix compared to sendmail X.
Using gatling:
Max random envelope rcpts: 1 Connections: 100 Max msgs/conn: Unlimited Messages: Fixed size 1 Kbytes Desired Message Rate: Unlimited Total messages: 5000 Total test elapsed time: 73.571 seconds (1:13.570) Overall message rate: 67.962 msg/sec Peak rate: 100.000 msg/sec
gatling -m 100 -c 5000 -z 1 -Z 1
program | writes | source time | source msgs/s | sink time |
smtps3 | 0 | 5 | 980 | - |
smtps3 -C | 11750 | 53 | 93 | |
smX.0.0.12 | 73 | 67 | 71 | |
smX.0.0.12 | 11157 (8000/2700) | 70 | 71 | 69 |
sm8.12.11 | 136 | 36 | ||
postfix 2.0.18 | 60 | 83 | 78 | |
postfix 2.0.18 | 12635 | 58 | 85 | 75 |
2004-03-16 results for wiz: source: time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: smtps3, file system: UFS, softupdates
parameters | oublock | writes | source time | sink time |
-C -i | 1920 | ? | 17 | 16 |
-C -p 1 | 1860 | ? | 17 | 17 |
-C -p 1 | 1940 | 2700 | 16 | 15 |
-C -p 1 | 1970 | 2770 | 16 | 15 |
-C -p 2 | ? | 15 | ? | |
-C -p 2 | 877+966 | 2600 | 15 | ? |
-C -p 4 | 455+476+432+472 | 2640 | 15 | ? |
New option: -f for flat, i.e., instead of using 16 subdirectories for CDB files, a single directory is used. Even though this does not cause a noticeable difference in run time, the number of I/O operations is reduced.
parameters | oublock | writes | source time |
-C -p 2 | 915+920 | 2600 | 14 |
-C -p 2 -f | 600+610 | 2200 | 14 |
2004-03-16 source: s-6.perf-lab, time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: -v-bsd.perf-lab, smtps3, file system: UFS, softupdates
parameters | oublock | writes | source time | sink time |
-C -i | 1430 | 2165 | 12 | 11 |
1550 | 2300 | 14 | 13 | |
-C -p 1 | 1500 | 2500 | 14 | 12 |
-C -p 2 | 1100+620 | 2500 | 13 | - |
800+770 | 2320 | 13 | - | |
-C -p 4 | 530+350+540+470 | 2600 | 13 | - |
Note: some of the write operations might be from softupdates due to the previous rm command (removing the CDB files).
2004-03-17 checks/t-readwrite on v-bsd (FreeBSD 4.9, SCSI):
parameters | softupdates | oublock | writes | time |
-s -f 1000 -p 1 | yes | 4000 | 4000 | 22 |
-s -f 100 -p 10 | yes | 2575 | 2579 | 14 |
-s -f 1000 -p 1 | no | 4050 | 4050 | 28 |
-s -f 100 -p 10 | no | 4050 | 4050 | 27 |
-p specifies the number of processes to start, -f specifies the number of files to write per process. The test cases above write 1000 files with either 1 or 10 processes. As it can be seen, it is significantly more efficient to use 10 processes if softupdates are turned on.
2004-03-17 checks/t-readwrite on wiz (FreeBSD 4.8, IDE):
parameters | softupdates | oublock | writes | time |
-s -f 1000 -p 1 | yes | 3000 | 3800 | 13 |
-s -f 100 -p 10 | yes | 2860 | 3600 | 13 |
In this case no difference can be seen, which is most likely a result of using an IDE drive with write-caching turned on (default).
2003-11-21 sm-9.0.0.9 running on v-sun.perf-lab
Source on bsd.dev-lab
time ./smtp-source -d -s 100 -m 5000 -c
using FS: 301.90 - 305.02s (16msgs/s)
using swap: 77.98 - 78.55s (64msgs/s)
Those tests ran only 32 SMTPS threads (the machine has 4 CPUs, hence the specified limit 128 was divided by 4). Using 128 SMTPS threads (by forcing only one process which was used anyway because SMTPS is run with the interactive option which does not start backgroup processes):
time ./smtp-source -d -s 100 -m 50000 -c
using swap: 727.73s (68msgs/s)
2004-03-09 sm-9.0.0.12 running on v-sun.perf-lab
time ./smtpc -O 20 -fa@s-6.perf-lab.sendmail.com -Rnobody@v-bsd.perf-lab.sendmail.com -t 100 -s 1000 -r v-sun.perf-lab.sendmail.com:1234
MTA options | FS | source time(s) | sink time(s) |
full MTS | SWAPFS | 16 | 14 |
without sched | SWAPFS | 10 | - |
smtpss | SWAPFS | 3 | - |
full MTS | UFS | 64, 65, 64 | 75, 70, 69 |
8.12.11 | SWAPFS | 16 | 19 |
8.12.11 | UFS | 141 | 138 |
Note: smX using UFS runs into connection limitations: QMGR believes there are 100 open connections even though the sink shows at most 18. This seems to be a communication latency between SMTPC and QMGR (and needs to be investigated further).
2004-03-17 checks/t-readwrite on v-sun (SunOS 5.8, SCSI):
parameters | writes | time |
-s -f 1000 -p 1 | - | 39 |
-s -f 100 -p 10 | - | 37 |
The filesystem on SunOS 5.8 does not cause any difference whether 1 or 10 processes are used.
2004-03-05 source, relay, and sink on zardoc (OpenBSD 3.2)
test with logging via smioout
zardoc$ time ./smtpc2 -O 10 -s 1000 -t 100 -r localhost:1234 24.17s real 0.94s user 2.57s system
smtps3 stats:
elapsed 26 Thread limits (min/max) 8/256 Waiting threads 8 Max busy threads 3 Requests served 1000
Note that there have been only 3 active threads. That means the client is not busy at all. Another test shows elapsed=23s, max busy threads=21, so the result isn't deterministic (the machine is running as normal SMTP server etc during tests).
test with logging via smioerr: smtpc2: 24.53s; no difference.
2004-03-17 checks/t-readwrite on aix-3 (AIX 4.3, SCSI, jfs):
parameters | writes | time |
-s -f 1000 -p 1 | - | 30 |
-s -f 100 -p 10 | - | 29 |
No (noticeable) difference.
Here are some results of a simple test program which creates and deletes a number of files and optionally renames them twice while doing so.
Notice: unless mentioned otherwise, all measurements are at most accurate to one second resolution. Repeated test will most likely show (slightly) different results. These tests are only listed to give an idea of the magnitude of available performance.
The involved systems are:
wdc0: unit 0 (wd0): <FUJITSU MPD3064AT> wd0: 6187MB (12672450 sectors), 13410 cyls, 15 heads, 63 S/T, 512 B/S
wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-351010> wd0: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4 wd0: 16-sector PIO, LBA, 9671MB, 16383 cyl, 16 head, 63 sec, 19807200 sectors
wd1 at pciide0 channel 0 drive 1: <Maxtor 98196H8>, wd1: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4, wd1: 16-sector PIO, LBA, 78167MB, 16383 cyl, 16 head, 63 sec, 160086528 sectors
ad0: 6187MB <FUJITSU MPC3064AT> [13410/15/63] at ata0-master UDMA33
ahc0: <Adaptec 2940 Ultra2 SCSI adapter (OEM)> da0: <IBM DNES-309170W SA30> Fixed Direct Access SCSI-3 device da0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled da0: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
ad0: 8063MB <FUJITSU MPD3084AT> [16383/16/63] at ata0-master UDMA66softupdates
hda: IBM-DJNA-370910, 8693MB w/1966kB Cache, CHS=1108/255/63ext 2 FS
hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=2434/255/63, UDMA(66) reiserfs: using 3.5.x disk format ReiserFS version 3.6.25
WD1200BB hdg: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=232581/16/63, UDMA(100)
ad0: 8693MB <IBM-DJNA-370910> [17662/16/63] at ata0-master UDMA33 acd0: CDROM <CD-ROM 40X> at ata1-master PIO4
scsi0 : ioc0: LSI53C1030, FwRev=01000000h, Ports=1, MaxQ=255, IRQ=52 Vendor: MAXTOR Model: ATLASU320_18_SCA Rev: B120 Type: Direct-Access ANSI SCSI revision: 03 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sda: 35916548 512-byte hdwr sectors (18389 MB) reiserfs: found format "3.6" with standard journal reiserfs: using ordered data mode Using r5 hash to sort names
da0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST39175LW 0001> Fixed Direct Access SCSI-2 device da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C)
ad0: 6187MB <FUJITSU MPD3064AT> [13410/15/63] at ata0-master UDMA33
da3 at ahc0 bus 0 target 4 lun 0 da3: <IBM DNES-309170Y SA30> Fixed Direct Access SCSI-3 device da3: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled da3: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-371350> wd0: 16-sector PIO, LBA, 12949MB, 16383 cyl, 16 head, 63 sec, 26520480 sectors
wd1 at pciide0 channel 0 drive 1: <WDC WD1200BB-53CAA0> wd1: 16-sector PIO, LBA, 114473MB, 16383 cyl, 16 head, 63 sec, 234441648 sectors
wd2 at pciide1 channel 0 drive 0: <Maxtor 6Y160P0> wd2: 16-sector PIO, LBA48, 156334MB, 16383 cyl, 16 head, 63 sec, 320173056 sectors wd2(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 6
In this section, some simple test programs are used that create some files, perform (sequential) read/write operations on them and remove them afterwards.
Entries in the following table are elapsed time in seconds (except for the first column which obviously refers to the machine description above). The program that has been used to produce these results is fsperf1.c.
machine | 5000 100 | -c 5000 100 | -c -r 5000 100 |
1 | 50 | 49 | 48 |
1 | 42 | 48 | 51 |
2a | 3 | 7 | 10 |
about 2200 tps | about 1500 tps | ||
11 | 21 | ||
3 | 10 | 34 | 34 |
about 500 tps | |||
4(a)i | 126 | 125 | |
4(a)ii | 208 | 454 | |
4b | 43 | 48 | |
7 | 7 | 13 | 16 |
5 | 9 | 8 | 9 |
8 | 133 | 201 | 603 |
9a | 52 | 665 | |
10a | 9 | 9 | 12 |
89 | 139 | 233 |
Comments:
(2004-07-14) With and without fsync(2) (-S)
common parameters | machine | -c | -c -r | -S -c | -S -c -r |
(5000 100) | 17 | 42 | 42 | 2 | 3 |
10b | 165 | 496 | 165 | 495 | |
18 | 83 | 83 | 5 | 8 | |
19a | 8 | 7 | 1 | 3 | |
19b | 8 | 9 | 1 | 3 | |
19c | 7 | 9 | 1 | 2 | |
(-s 32 5000 100) | 17 | 109 | 109 | 8 | 9 |
10b | 250 | 537 | 207 | 498 | |
18 | 114 | 113 | 14 | 16 | |
19b | 87 | 81 | 3 | 5 | |
19c | 26 | 26 | 4 | 5 |
Comments:
Next version: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks.
machine | -h 1 -c 1000 1000 | -h 1 -c -r 1000 1000 |
1 | 18 | 18 |
2a | 24 | 24 |
2b | 7 | 9 |
3 | 14 | 14 |
4(a)i | 23 | 23 |
4(a)ii | 33 | 77 |
4b | 25 | 49 |
5 | 3 | 2 |
7 | 3 | 4 |
8 | 58 | 163 |
9a | 51 | 139 |
11 | 28 | 48 |
Comments:
Next version fsperf1.c: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks. The parameters for the following table are 1000 operations and 1000 files, hence each file is used once. Additional parameters are listed in the heading. c: create, h 1: one level hashing, r: rename file twice, p: populate directories before test, then just reuse the files.
machine | -h 1 -c | -h 1 -c -r | -p -h 1 -c | -p -h 1 -c -r |
1 | 32 | 31 | 18 | 17 |
2a | 18 | 18 | 9 | 10 |
2b | 10 | 10 | 8 | 10 |
5 | 2 | 1 | 2 | 1 |
6 | 2 | 2 | 4 | 4 |
7 | 2 | 4 | 2 | 3 |
8 | 58 | 165 | 78 | 178 |
9a | 27 | 127 | 33 | 131 |
9c | 13 | 51 | 37 | 55 |
11 | 28 | 48 | 28 | 48 |
Comments:
Another test program (fsseq1.c) writes lines to a file and uses fsync(2) after a specified number (-C parameter).
20000 entries (10000 entries each for received/delivered, total 490000 bytes).
machine | - | -C 100 | -C 50 | -C 10 | -C 5 | -C 2 | -f |
1 | 1 | 4 | 6 | 17 | 32 | 78 | 150 |
2a | 0 | 2 | 2 | 5 | 5 | 9 | 18 |
2b | 1 | 0 | 1 | 3 | 4 | 10 | 20 |
3 | 1 | 2 | 3 | 9 | 16 | 37 | 68 |
5 | 1 | 1 | 2 | 6 | 12 | 27 | 56 |
7 | 0 | 4 | 8 | 39 | 79 | 198 | 410 |
8 | 1 | 7 | 13 | 60 | 120 | 299 | 598 |
9a | 1 | 8 | 13 | 15 | 62 | 90 | 140 |
11 | 0 | 6 | 12 | 53 | 106 | 262 | 518 |
This clearly demonstrates the need for group commits. However, the program requires a lot of CPU since each line is generated by snprintf(). Hence the full I/O speed may not be reached. To confirm this, another program (fsseq2.c) is used that just writes a buffer with a fixed content to a file.
The following table lists the results for group commits (C) together with various buffer sizes (256, 1024, 4096, 8192, and 16384). As usual the entries are execution time in seconds. The program writes 2000 records in total, e.g., for size 16384 that is 31MB data.
machine | C | 256 | 1024 | 4096 | 8192 | 16384 | |
5 | 1 | 4 | 5 | 10 | 20 | 34 | |
2 | 2 | 4 | 6 | 12 | 22 | ||
5 | 1 | 2 | 5 | 7 | 15 | ||
10 | 1 | 1 | 3 | 6 | 12 | ||
50 | 1 | 0 | 3 | 5 | 10 | ||
100 | 0 | 1 | 3 | 5 | 10 | ||
7 | 1 | 1 | 5 | 20 | 40 | 44 | |
2 | 1 | 5 | 11 | 23 | 29 | ||
5 | 1 | 5 | 9 | 12 | 13 | ||
10 | 1 | 2 | 3 | 6 | 7 | ||
50 | 0 | 1 | 1 | 2 | 3 | ||
100 | 0 | 1 | 1 | 1 | 3 | ||
8 | 1 | 3 | 10 | 45 | 95 | 109 | |
2 | 2 | 11 | 23 | 52 | 59 | ||
5 | 3 | 11 | 19 | 24 | 32 | ||
10 | 2 | 5 | 6 | 15 | 21 | ||
50 | 1 | 2 | 3 | 8 | 13 | ||
100 | 0 | 1 | 3 | 6 | 13 | ||
9a | 1 | 3 | 12 | 34 | 35 | 58 | |
2 | 3 | 12 | 18 | 53 | 53 | ||
5 | 3 | 6 | 21 | 23 | 24 | ||
10 | 3 | 5 | 6 | 13 | 14 | ||
50 | 1 | 2 | 2 | 5 | 7 | ||
100 | 1 | 1 | 2 | 3 | 6 | ||
11 | 1 | 21 | 35 | 77 | 83 | 92 | |
2 | 13 | 26 | 38 | 45 | 50 | ||
5 | 8 | 13 | 17 | 20 | 24 | ||
10 | 5 | 6 | 10 | 11 | 15 | ||
50 | 1 | 2 | 2 | 4 | 7 | ||
100 | 1 | 1 | 2 | 3 | 6 |
Comments:
Yet another program (fsseq3.c) uses write() instead of fwrite(). This time the tests write 40000KB each, which makes it simpler to determine the throughput.
Note: as usual, these times are not very accurate (1s resolution), and hence the rate is inaccurate too. Machines:
C | s | records | time | KB/s |
1 | 512 | 80000 | 1365 | 29 |
1 | 1024 | 40000 | 734 | 54 |
1 | 2048 | 20000 | 451 | 88 |
1 | 4096 | 10000 | 352 | 113 |
1 | 8192 | 5000 | 250 | 160 |
2 | 512 | 80000 | 736 | 54 |
2 | 1024 | 40000 | 453 | 88 |
2 | 2048 | 20000 | 354 | 112 |
2 | 4096 | 10000 | 382 | 104 |
2 | 8192 | 5000 | 225 | 177 |
5 | 512 | 80000 | 638 | 62 |
5 | 1024 | 40000 | 585 | 68 |
5 | 2048 | 20000 | 312 | 128 |
5 | 4096 | 10000 | 187 | 213 |
5 | 8192 | 5000 | 101 | 396 |
10 | 512 | 80000 | 561 | 71 |
10 | 1024 | 40000 | 296 | 135 |
10 | 2048 | 20000 | 161 | 248 |
10 | 4096 | 10000 | 88 | 454 |
10 | 8192 | 5000 | 60 | 666 |
50 | 512 | 80000 | 128 | 312 |
50 | 1024 | 40000 | 70 | 571 |
50 | 2048 | 20000 | 41 | 975 |
50 | 4096 | 10000 | 34 | 1176 |
50 | 8192 | 5000 | 29 | 1379 |
100 | 512 | 80000 | 73 | 547 |
100 | 1024 | 40000 | 43 | 930 |
100 | 2048 | 20000 | 33 | 1212 |
100 | 4096 | 10000 | 28 | 1428 |
100 | 8192 | 5000 | 27 | 1481 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 165 | 242 |
1 | 1024 | 40000 | 90 | 444 |
1 | 2048 | 20000 | 54 | 740 |
1 | 4096 | 10000 | 28 | 1428 |
1 | 8192 | 5000 | 16 | 2500 |
2 | 512 | 80000 | 94 | 425 |
2 | 1024 | 40000 | 52 | 769 |
2 | 2048 | 20000 | 30 | 1333 |
2 | 4096 | 10000 | 17 | 2352 |
2 | 8192 | 5000 | 11 | 3636 |
5 | 512 | 80000 | 54 | 740 |
5 | 1024 | 40000 | 33 | 1212 |
5 | 2048 | 20000 | 19 | 2105 |
5 | 4096 | 10000 | 11 | 3636 |
5 | 8192 | 5000 | 8 | 5000 |
10 | 512 | 80000 | 31 | 1290 |
10 | 1024 | 40000 | 18 | 2222 |
10 | 2048 | 20000 | 11 | 3636 |
10 | 4096 | 10000 | 8 | 5000 |
10 | 8192 | 5000 | 6 | 6666 |
50 | 512 | 80000 | 11 | 3636 |
50 | 1024 | 40000 | 8 | 5000 |
50 | 2048 | 20000 | 6 | 6666 |
50 | 4096 | 10000 | 5 | 8000 |
50 | 8192 | 5000 | 4 | 10000 |
100 | 512 | 80000 | 10 | 4000 |
100 | 1024 | 40000 | 8 | 5000 |
100 | 2048 | 20000 | 5 | 8000 |
100 | 4096 | 10000 | 4 | 10000 |
100 | 8192 | 5000 | 5 | 8000 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 13440 | 2 |
1 | 1024 | 40000 | 6790 | 5 |
1 | 2048 | 20000 | 3451 | 11 |
1 | 4096 | 10000 | 1779 | 22 |
1 | 8192 | 5000 | 1007 | 39 |
2 | 512 | 80000 | 6790 | 5 |
2 | 1024 | 40000 | 3439 | 11 |
2 | 2048 | 20000 | 1763 | 22 |
2 | 4096 | 10000 | 909 | 44 |
2 | 8192 | 5000 | 471 | 84 |
5 | 512 | 80000 | 2763 | 14 |
5 | 1024 | 40000 | 1414 | 28 |
5 | 2048 | 20000 | 739 | 54 |
5 | 4096 | 10000 | 383 | 104 |
5 | 8192 | 5000 | 208 | 192 |
10 | 512 | 80000 | 1414 | 28 |
10 | 1024 | 40000 | 731 | 54 |
10 | 2048 | 20000 | 384 | 104 |
10 | 4096 | 10000 | 208 | 192 |
10 | 8192 | 5000 | 120 | 333 |
50 | 512 | 80000 | 312 | 128 |
50 | 1024 | 40000 | 174 | 229 |
50 | 2048 | 20000 | 101 | 396 |
50 | 4096 | 10000 | 64 | 625 |
50 | 8192 | 5000 | 46 | 869 |
100 | 512 | 80000 | 171 | 233 |
100 | 1024 | 40000 | 100 | 400 |
100 | 2048 | 20000 | 64 | 625 |
100 | 4096 | 10000 | 46 | 869 |
100 | 8192 | 5000 | 37 | 1081 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 130 | 307 |
1 | 1024 | 40000 | 93 | 430 |
1 | 2048 | 20000 | 78 | 512 |
1 | 4096 | 10000 | 23 | 1739 |
1 | 8192 | 5000 | 12 | 3333 |
2 | 512 | 80000 | 62 | 645 |
2 | 1024 | 40000 | 46 | 869 |
2 | 2048 | 20000 | 24 | 1666 |
2 | 4096 | 10000 | 13 | 3076 |
2 | 8192 | 5000 | 15 | 2666 |
5 | 512 | 80000 | 66 | 606 |
5 | 1024 | 40000 | 31 | 1290 |
5 | 2048 | 20000 | 18 | 2222 |
5 | 4096 | 10000 | 15 | 2666 |
5 | 8192 | 5000 | 10 | 4000 |
10 | 512 | 80000 | 28 | 1428 |
10 | 1024 | 40000 | 19 | 2105 |
10 | 2048 | 20000 | 13 | 3076 |
10 | 4096 | 10000 | 10 | 4000 |
10 | 8192 | 5000 | 10 | 4000 |
50 | 512 | 80000 | 14 | 2857 |
50 | 1024 | 40000 | 10 | 4000 |
50 | 2048 | 20000 | 10 | 4000 |
50 | 4096 | 10000 | 9 | 4444 |
50 | 8192 | 5000 | 7 | 5714 |
100 | 512 | 80000 | 11 | 3636 |
100 | 1024 | 40000 | 10 | 4000 |
100 | 2048 | 20000 | 8 | 5000 |
100 | 4096 | 10000 | 8 | 5000 |
100 | 8192 | 5000 | 8 | 5000 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 3347 | 11 |
1 | 1024 | 40000 | 1689 | 23 |
1 | 2048 | 20000 | 845 | 47 |
1 | 4096 | 10000 | 418 | 95 |
1 | 8192 | 5000 | 192 | 208 |
2 | 512 | 80000 | 1243 | 32 |
2 | 1024 | 40000 | 796 | 50 |
2 | 2048 | 20000 | 431 | 92 |
2 | 4096 | 10000 | 222 | 180 |
2 | 8192 | 5000 | 122 | 327 |
5 | 512 | 80000 | 655 | 61 |
5 | 1024 | 40000 | 268 | 149 |
5 | 2048 | 20000 | 161 | 248 |
5 | 4096 | 10000 | 108 | 370 |
5 | 8192 | 5000 | 58 | 689 |
10 | 512 | 80000 | 355 | 112 |
10 | 1024 | 40000 | 185 | 216 |
10 | 2048 | 20000 | 85 | 470 |
10 | 4096 | 10000 | 42 | 952 |
10 | 8192 | 5000 | 38 | 1052 |
50 | 512 | 80000 | 88 | 454 |
50 | 1024 | 40000 | 49 | 816 |
50 | 2048 | 20000 | 31 | 1290 |
50 | 4096 | 10000 | 18 | 2222 |
50 | 8192 | 5000 | 10 | 4000 |
100 | 512 | 80000 | 45 | 888 |
100 | 1024 | 40000 | 33 | 1212 |
100 | 2048 | 20000 | 19 | 2105 |
100 | 4096 | 10000 | 14 | 2857 |
100 | 8192 | 5000 | 14 | 2857 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 6302 | 6 |
1 | 1024 | 40000 | 3220 | 12 |
1 | 2048 | 20000 | 1695 | 23 |
1 | 4096 | 10000 | 949 | 42 |
1 | 8192 | 5000 | 552 | 72 |
2 | 512 | 80000 | 3183 | 12 |
2 | 1024 | 40000 | 1708 | 23 |
2 | 2048 | 20000 | 950 | 42 |
2 | 4096 | 10000 | 484 | 82 |
2 | 8192 | 5000 | 299 | 133 |
5 | 512 | 80000 | 1402 | 28 |
5 | 1024 | 40000 | 805 | 49 |
5 | 2048 | 20000 | 440 | 90 |
5 | 4096 | 10000 | 252 | 158 |
5 | 8192 | 5000 | 137 | 291 |
10 | 512 | 80000 | 783 | 51 |
10 | 1024 | 40000 | 395 | 101 |
10 | 2048 | 20000 | 211 | 189 |
10 | 4096 | 10000 | 122 | 327 |
10 | 8192 | 5000 | 87 | 459 |
50 | 512 | 80000 | 181 | 220 |
50 | 1024 | 40000 | 107 | 373 |
50 | 2048 | 20000 | 68 | 588 |
50 | 4096 | 10000 | 49 | 816 |
50 | 8192 | 5000 | 42 | 952 |
100 | 512 | 80000 | 111 | 360 |
100 | 1024 | 40000 | 70 | 571 |
100 | 2048 | 20000 | 50 | 800 |
100 | 4096 | 10000 | 40 | 1000 |
100 | 8192 | 5000 | 36 | 1111 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 2638 | 15 |
1 | 1024 | 40000 | 1419 | 28 |
1 | 2048 | 20000 | 753 | 53 |
1 | 4096 | 10000 | 442 | 90 |
1 | 8192 | 5000 | 221 | 180 |
2 | 512 | 80000 | 1379 | 29 |
2 | 1024 | 40000 | 774 | 51 |
2 | 2048 | 20000 | 409 | 97 |
2 | 4096 | 10000 | 220 | 181 |
2 | 8192 | 5000 | 124 | 322 |
5 | 512 | 80000 | 644 | 62 |
5 | 1024 | 40000 | 382 | 104 |
5 | 2048 | 20000 | 198 | 202 |
5 | 4096 | 10000 | 105 | 380 |
5 | 8192 | 5000 | 58 | 689 |
10 | 512 | 80000 | 355 | 112 |
10 | 1024 | 40000 | 196 | 204 |
10 | 2048 | 20000 | 104 | 384 |
10 | 4096 | 10000 | 59 | 677 |
10 | 8192 | 5000 | 32 | 1250 |
50 | 512 | 80000 | 90 | 444 |
50 | 1024 | 40000 | 51 | 784 |
50 | 2048 | 20000 | 28 | 1428 |
50 | 4096 | 10000 | 19 | 2105 |
50 | 8192 | 5000 | 15 | 2666 |
100 | 512 | 80000 | 54 | 740 |
100 | 1024 | 40000 | 28 | 1428 |
100 | 2048 | 20000 | 20 | 2000 |
100 | 4096 | 10000 | 15 | 2666 |
100 | 8192 | 5000 | 14 | 2857 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 2642 | 15 |
1 | 1024 | 40000 | 1312 | 30 |
1 | 2048 | 20000 | 723 | 55 |
1 | 4096 | 10000 | 376 | 106 |
1 | 8192 | 5000 | 185 | 216 |
2 | 512 | 80000 | 1363 | 29 |
2 | 1024 | 40000 | 699 | 57 |
2 | 2048 | 20000 | 359 | 111 |
2 | 4096 | 10000 | 185 | 216 |
2 | 8192 | 5000 | 104 | 384 |
5 | 512 | 80000 | 563 | 71 |
5 | 1024 | 40000 | 302 | 132 |
5 | 2048 | 20000 | 162 | 246 |
5 | 4096 | 10000 | 88 | 454 |
5 | 8192 | 5000 | 46 | 869 |
10 | 512 | 80000 | 299 | 133 |
10 | 1024 | 40000 | 161 | 248 |
10 | 2048 | 20000 | 87 | 459 |
10 | 4096 | 10000 | 46 | 869 |
10 | 8192 | 5000 | 24 | 1666 |
50 | 512 | 80000 | 81 | 493 |
50 | 1024 | 40000 | 44 | 909 |
50 | 2048 | 20000 | 35 | 1142 |
50 | 4096 | 10000 | 19 | 2105 |
50 | 8192 | 5000 | 13 | 3076 |
100 | 512 | 80000 | 51 | 784 |
100 | 1024 | 40000 | 35 | 1142 |
100 | 2048 | 20000 | 26 | 1538 |
100 | 4096 | 10000 | 15 | 2666 |
100 | 8192 | 5000 | 13 | 3076 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 2576 | 15 |
1 | 1024 | 40000 | 1326 | 30 |
1 | 2048 | 20000 | 707 | 56 |
1 | 4096 | 10000 | 377 | 106 |
1 | 8192 | 5000 | 192 | 208 |
2 | 512 | 80000 | 1324 | 30 |
2 | 1024 | 40000 | 685 | 58 |
2 | 2048 | 20000 | 349 | 114 |
2 | 4096 | 10000 | 187 | 213 |
2 | 8192 | 5000 | 107 | 373 |
5 | 512 | 80000 | 578 | 69 |
5 | 1024 | 40000 | 313 | 127 |
5 | 2048 | 20000 | 163 | 245 |
5 | 4096 | 10000 | 89 | 449 |
5 | 8192 | 5000 | 46 | 869 |
10 | 512 | 80000 | 306 | 130 |
10 | 1024 | 40000 | 162 | 246 |
10 | 2048 | 20000 | 86 | 465 |
10 | 4096 | 10000 | 46 | 869 |
10 | 8192 | 5000 | 25 | 1600 |
50 | 512 | 80000 | 82 | 487 |
50 | 1024 | 40000 | 44 | 909 |
50 | 2048 | 20000 | 33 | 1212 |
50 | 4096 | 10000 | 19 | 2105 |
50 | 8192 | 5000 | 13 | 3076 |
100 | 512 | 80000 | 52 | 769 |
100 | 1024 | 40000 | 36 | 1111 |
100 | 2048 | 20000 | 25 | 1600 |
100 | 4096 | 10000 | 16 | 2500 |
100 | 8192 | 5000 | 13 | 3076 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 65 | 615 |
1 | 1024 | 40000 | 61 | 655 |
1 | 2048 | 20000 | 59 | 677 |
1 | 4096 | 10000 | 5 | 8000 |
1 | 8192 | 5000 | 4 | 10000 |
2 | 512 | 80000 | 13 | 3076 |
2 | 1024 | 40000 | 8 | 5000 |
2 | 2048 | 20000 | 4 | 10000 |
2 | 4096 | 10000 | 4 | 10000 |
2 | 8192 | 5000 | 3 | 13333 |
5 | 512 | 80000 | 44 | 909 |
5 | 1024 | 40000 | 21 | 1904 |
5 | 2048 | 20000 | 13 | 3076 |
5 | 4096 | 10000 | 3 | 13333 |
5 | 8192 | 5000 | 3 | 13333 |
10 | 512 | 80000 | 12 | 3333 |
10 | 1024 | 40000 | 3 | 13333 |
10 | 2048 | 20000 | 3 | 13333 |
10 | 4096 | 10000 | 3 | 13333 |
10 | 8192 | 5000 | 5 | 8000 |
50 | 512 | 80000 | 11 | 3636 |
50 | 1024 | 40000 | 3 | 13333 |
50 | 2048 | 20000 | 5 | 8000 |
50 | 4096 | 10000 | 5 | 8000 |
50 | 8192 | 5000 | 4 | 10000 |
100 | 512 | 80000 | 5 | 8000 |
100 | 1024 | 40000 | 5 | 8000 |
100 | 2048 | 20000 | 5 | 8000 |
100 | 4096 | 10000 | 4 | 10000 |
100 | 8192 | 5000 | 3 | 13333 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 124 | 322 |
1 | 1024 | 40000 | 87 | 459 |
1 | 2048 | 20000 | 72 | 555 |
1 | 4096 | 10000 | 20 | 2000 |
1 | 8192 | 5000 | 10 | 4000 |
2 | 512 | 80000 | 47 | 851 |
2 | 1024 | 40000 | 32 | 1250 |
2 | 2048 | 20000 | 16 | 2500 |
2 | 4096 | 10000 | 8 | 5000 |
2 | 8192 | 5000 | 5 | 8000 |
5 | 512 | 80000 | 56 | 714 |
5 | 1024 | 40000 | 27 | 1481 |
5 | 2048 | 20000 | 20 | 2000 |
5 | 4096 | 10000 | 5 | 8000 |
5 | 8192 | 5000 | 5 | 8000 |
10 | 512 | 80000 | 23 | 1739 |
10 | 1024 | 40000 | 17 | 2352 |
10 | 2048 | 20000 | 6 | 6666 |
10 | 4096 | 10000 | 3 | 13333 |
10 | 8192 | 5000 | 6 | 6666 |
50 | 512 | 80000 | 7 | 5714 |
50 | 1024 | 40000 | 4 | 10000 |
50 | 2048 | 20000 | 6 | 6666 |
50 | 4096 | 10000 | 6 | 6666 |
50 | 8192 | 5000 | 4 | 10000 |
100 | 512 | 80000 | 7 | 5714 |
100 | 1024 | 40000 | 6 | 6666 |
100 | 2048 | 20000 | 5 | 8000 |
100 | 4096 | 10000 | 4 | 10000 |
100 | 8192 | 5000 | 3 | 13333 |
C | s | records | time | KB/s |
1 | 512 | 80000 | 205 | 195 |
1 | 1024 | 40000 | 144 | 277 |
1 | 2048 | 20000 | 122 | 327 |
1 | 4096 | 10000 | 14 | 2857 |
1 | 8192 | 5000 | 7 | 5714 |
2 | 512 | 80000 | 34 | 1176 |
2 | 1024 | 40000 | 22 | 1818 |
2 | 2048 | 20000 | 13 | 3076 |
2 | 4096 | 10000 | 7 | 5714 |
2 | 8192 | 5000 | 5 | 8000 |
5 | 512 | 80000 | 96 | 416 |
5 | 1024 | 40000 | 48 | 833 |
5 | 2048 | 20000 | 20 | 2000 |
5 | 4096 | 10000 | 4 | 10000 |
5 | 8192 | 5000 | 4 | 10000 |
10 | 512 | 80000 | 36 | 1111 |
10 | 1024 | 40000 | 7 | 5714 |
10 | 2048 | 20000 | 5 | 8000 |
10 | 4096 | 10000 | 4 | 10000 |
10 | 8192 | 5000 | 3 | 13333 |
50 | 512 | 80000 | 12 | 3333 |
50 | 1024 | 40000 | 4 | 10000 |
50 | 2048 | 20000 | 4 | 10000 |
50 | 4096 | 10000 | 3 | 13333 |
50 | 8192 | 5000 | 3 | 13333 |
100 | 512 | 80000 | 7 | 5714 |
100 | 1024 | 40000 | 6 | 6666 |
100 | 2048 | 20000 | 3 | 13333 |
100 | 4096 | 10000 | 3 | 13333 |
100 | 8192 | 5000 | 3 | 13333 |
Very simple measurement of transfer rate:
time dd ibs=8192 if=/dev/zero obs=8192 count=5120 of=incq
machine | s | MB/s |
1 | 11.6 | 3.6 |
2a | 4.8 | 8.4 |
2b | 1.9 | 20.9 |
5 | 10.83 | 3.9 |
6 | 0.65 | 61 |
7 | 1.0 | 40.0 |
8 | 14.8 | 2.8 |
9 | 6.3 | 6.6 |
11 | 6.98 | 6.0 |
12a | 0.247 | 161 |
12b | 0.401 | 99 |
12c | 0.357 | 112 |
Comments:
dd ibs=8192 if=/dev/zero obs=8192 count=124000 of=incq
machine | s | MB/s |
12a | 24.762 | 39 |
12b | 22.608 | 42 |
The data in this table is more likely, even though 40MB/s is still very fast.
For comparison with the Berkeley DB performance data, more tests have been run with fsseq4 with different parameters. Number of records is 100000 unless otherwise noted, t/s is transactions (records written) per second. Notice: fsseq3 writes twice as much records as fsseq4 (one add and one delete entry each), and it calls fsync() twice as often (after the add and after the delete entry).
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 2 | 976 | 50000 |
1000 | 20 | 7 | 279 | 14285 |
100 | 20 | 20 | 97 | 5000 |
100000 | 100 | 3 | 3255 | 33333 |
10000 | 100 | 4 | 2441 | 25000 |
1000 | 100 | 8 | 1220 | 12500 |
100 | 100 | 57 | 171 | 1754 |
100000 | 512 | 15 | 3333 | 6666 |
10000 | 512 | 16 | 3125 | 6250 |
1000 | 512 | 17 | 2941 | 5882 |
100 | 512 | 67 | 746 | 1492 |
100000 | 1024 | 29 | 3448 | 3448 |
10000 | 1024 | 30 | 3333 | 3333 |
1000 | 1024 | 33 | 3030 | 3030 |
100 | 1024 | 77 | 1298 | 1298 |
100000 | 2048 | 60 | 3333 | 1666 |
10000 | 2048 | 60 | 3333 | 1666 |
1000 | 2048 | 64 | 3125 | 1562 |
100 | 2048 | 101 | 1980 | 990 |
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 2 | 976 | 50000 |
100 | 20 | 2 | 976 | 50000 |
100000 | 100 | 2 | 4882 | 50000 |
10000 | 100 | 1 | 9765 | 100000 |
1000 | 100 | 2 | 4882 | 50000 |
100 | 100 | 7 | 1395 | 14285 |
100000 | 512 | 3 | 16666 | 33333 |
10000 | 512 | 3 | 16666 | 33333 |
1000 | 512 | 4 | 12500 | 25000 |
100 | 512 | 6 | 8333 | 16666 |
100000 | 1024 | 6 | 16666 | 16666 |
10000 | 1024 | 5 | 20000 | 20000 |
1000 | 1024 | 6 | 16666 | 16666 |
100 | 1024 | 8 | 12500 | 12500 |
100000 | 2048 | 12 | 16666 | 8333 |
10000 | 2048 | 12 | 16666 | 8333 |
1000 | 2048 | 15 | 13333 | 6666 |
100 | 2048 | 15 | 13333 | 6666 |
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 2 | 976 | 50000 |
100 | 20 | 9 | 217 | 11111 |
100000 | 100 | 3 | 3255 | 33333 |
10000 | 100 | 4 | 2441 | 25000 |
1000 | 100 | 5 | 1953 | 20000 |
100 | 100 | 15 | 651 | 6666 |
100000 | 512 | 16 | 3125 | 6250 |
10000 | 512 | 18 | 2777 | 5555 |
1000 | 512 | 22 | 2272 | 4545 |
100 | 512 | 75 | 666 | 1333 |
100000 | 1024 | 34 | 2941 | 2941 |
10000 | 1024 | 35 | 2857 | 2857 |
1000 | 1024 | 46 | 2173 | 2173 |
100 | 1024 | 139 | 719 | 719 |
100000 | 2048 | 67 | 2985 | 1492 |
10000 | 2048 | 79 | 2531 | 1265 |
1000 | 2048 | 95 | 2105 | 1052 |
100 | 2048 | 246 | 813 | 406 |
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 4 | 488 | 25000 |
100 | 20 | 31 | 63 | 3225 |
100000 | 100 | 2 | 4882 | 50000 |
10000 | 100 | 2 | 4882 | 50000 |
1000 | 100 | 6 | 1627 | 16666 |
100 | 100 | 33 | 295 | 3030 |
100000 | 512 | 8 | 6250 | 12500 |
10000 | 512 | 11 | 4545 | 9090 |
1000 | 512 | 15 | 3333 | 6666 |
100 | 512 | 50 | 1000 | 2000 |
100000 | 1024 | 11 | 9090 | 9090 |
10000 | 1024 | 10 | 10000 | 10000 |
1000 | 1024 | 14 | 7142 | 7142 |
100 | 1024 | 42 | 2380 | 2380 |
100000 | 2048 | 25 | 8000 | 4000 |
10000 | 2048 | 26 | 7692 | 3846 |
1000 | 2048 | 21 | 9523 | 4761 |
100 | 2048 | 42 | 4761 | 2380 |
C | s | time | KB/s | t/s |
100000 | 20 | 3 | 651 | 33333 |
10000 | 20 | 3 | 651 | 33333 |
1000 | 20 | 3 | 651 | 33333 |
100 | 20 | 5 | 390 | 20000 |
100000 | 100 | 3 | 3255 | 33333 |
10000 | 100 | 4 | 2441 | 25000 |
1000 | 100 | 4 | 2441 | 25000 |
100 | 100 | 9 | 1085 | 11111 |
100000 | 512 | 5 | 10000 | 20000 |
10000 | 512 | 5 | 10000 | 20000 |
1000 | 512 | 7 | 7142 | 14285 |
100 | 512 | 20 | 2500 | 5000 |
100000 | 1024 | 8 | 12500 | 12500 |
10000 | 1024 | 8 | 12500 | 12500 |
1000 | 1024 | 9 | 11111 | 11111 |
100 | 1024 | 26 | 3846 | 3846 |
100000 | 2048 | 15 | 13333 | 6666 |
10000 | 2048 | 16 | 12500 | 6250 |
1000 | 2048 | 21 | 9523 | 4761 |
100 | 2048 | 36 | 5555 | 2777 |
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 4 | 488 | 25000 |
100 | 20 | 29 | 67 | 3448 |
100000 | 100 | 1 | 9765 | 100000 |
10000 | 100 | 2 | 4882 | 50000 |
1000 | 100 | 5 | 1953 | 20000 |
100 | 100 | 36 | 271 | 2777 |
100000 | 512 | 4 | 12500 | 25000 |
10000 | 512 | 5 | 10000 | 20000 |
1000 | 512 | 9 | 5555 | 11111 |
100 | 512 | 44 | 1136 | 2272 |
100000 | 1024 | 8 | 12500 | 12500 |
10000 | 1024 | 9 | 11111 | 11111 |
1000 | 1024 | 13 | 7692 | 7692 |
100 | 1024 | 54 | 1851 | 1851 |
100000 | 2048 | 15 | 13333 | 6666 |
10000 | 2048 | 17 | 11764 | 5882 |
1000 | 2048 | 22 | 9090 | 4545 |
100 | 2048 | 67 | 2985 | 1492 |
C | s | time | KB/s | t/s |
100000 | 20 | 2 | 976 | 50000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 2 | 976 | 50000 |
100 | 20 | 3 | 651 | 33333 |
100000 | 100 | 2 | 4882 | 50000 |
10000 | 100 | 2 | 4882 | 50000 |
1000 | 100 | 2 | 4882 | 50000 |
100 | 100 | 6 | 1627 | 16666 |
100000 | 512 | 3 | 16666 | 33333 |
10000 | 512 | 3 | 16666 | 33333 |
1000 | 512 | 4 | 12500 | 25000 |
100 | 512 | 21 | 2380 | 4761 |
100000 | 1024 | 3 | 33333 | 33333 |
10000 | 1024 | 4 | 25000 | 25000 |
1000 | 1024 | 7 | 14285 | 14285 |
100 | 1024 | 41 | 2439 | 2439 |
100000 | 2048 | 4 | 50000 | 25000 |
10000 | 2048 | 5 | 40000 | 20000 |
1000 | 2048 | 12 | 16666 | 8333 |
100 | 2048 | 80 | 2500 | 1250 |
C | s | time | KB/s | t/s |
100000 | 20 | 1 | 1953 | 100000 |
10000 | 20 | 1 | 1953 | 100000 |
1000 | 20 | 4 | 488 | 25000 |
100 | 20 | 23 | 84 | 4347 |
100000 | 100 | 2 | 4882 | 50000 |
10000 | 100 | 2 | 4882 | 50000 |
1000 | 100 | 5 | 1953 | 20000 |
100 | 100 | 32 | 305 | 3125 |
100000 | 512 | 5 | 10000 | 20000 |
10000 | 512 | 5 | 10000 | 20000 |
1000 | 512 | 9 | 5555 | 11111 |
100 | 512 | 42 | 1190 | 2380 |
100000 | 1024 | 10 | 10000 | 10000 |
10000 | 1024 | 11 | 9090 | 9090 |
1000 | 1024 | 14 | 7142 | 7142 |
100 | 1024 | 59 | 1694 | 1694 |
100000 | 2048 | 21 | 9523 | 4761 |
10000 | 2048 | 21 | 9523 | 4761 |
1000 | 2048 | 25 | 8000 | 4000 |
100 | 2048 | 78 | 2564 | 1282 |
Comments:
Some performance data gathered from the WWW.
SR Office DriveMark 2002 in IO/Sec taken from [Ra01]:
Manufacturer | Model | I/O operations/second |
Seagate | Cheetah X15-36LP (36.7 GB Ultra160/m SCSI) | 485 |
Maxtor | Atlas 10k III (73 GB Ultra160/m SCSI) | 455 |
Fujitsu | MAM3367 (36 GB Ultra160/m SCSI) | 446 |
IBM | Ultrastar 36Z15 (36.7 GB Ultra160/m SCSI) | 402 |
Western Digital | Caviar WD1000BB-SE (100 GB ATA-100) | 397 |
Seagate | Cheetah 36ES (36 GB Ultra160/m SCSI) | 373 |
Fujitsu | MAN3735 (73 GB Ultra160/m SCSI) | 369 |
Seagate | Cheetah 73LP (73.4 GB Ultra160/m SCSI) | 364 |
Western Digital | Caviar WD1200BB (120 GB ATA-100) | 337 |
Seagate | Cheetah 36XL (36.7 GB Ultra 160/m SCSI) | 328 |
IBM | Deskstar 60GXP (60.0 GB ATA-100) | 303 |
Maxtor | DiamondMax Plus D740X (80 GB ATA-133) | 301 |
Seagate | Barracuda ATA IV (80 GB ATA-100) | 296 |
Quantum | Fireball Plus AS (60.0 GB ATA-100) | 295 |
Quantum | Atlas V (36.7 GB Ultra160/m SCSI) | 269 |
Seagate | Barracuda 180 (180 GB Ultra160/m SCSI) | 249 |
Maxtor | DiamondMax 536DX (100 GB ATA-100) | 248 |
Seagate | Barracuda 36ES (36 GB Ultra160/m SCSI) | 222 |
Seagate | U6 (80 GB ATA-100) | 210 |
Samsung | SpinPoint P20 (40.0 GB ATA-100) | 192 |
ZD Business Disk WinMark 99 in MB/Sec
Manufacturer | Model | MB/second |
Seagate | Cheetah X15-36LP (36.7 GB Ultra160/m SCSI) | 13.1 |
Maxtor | Atlas 10k III (73 GB Ultra160/m SCSI) | 12.0 |
IBM | Ultrastar 36Z15 (36.7 GB Ultra160/m SCSI) | 11.3 |
Fujitsu | MAM3367 (36 GB Ultra160/m SCSI) | 11.1 |
Seagate | Cheetah 36ES (36 GB Ultra160/m SCSI) | 10.5 |
Seagate | Cheetah 73LP (73.4 GB Ultra160/m SCSI) | 10.2 |
Seagate | Cheetah 36XL (36.7 GB Ultra 160/m SCSI) | 9.9 |
Western Digital | Caviar WD1000BB-SE (100 GB ATA-100) | 9.8 |
Fujitsu | MAN3735 (73 GB Ultra160/m SCSI) | 9.1 |
Western Digital | Caviar WD1200BB (120 GB ATA-100) | 8.9 |
IBM | Deskstar 60GXP (60.0 GB ATA-100) | 8.8 |
Seagate | Barracuda ATA IV (80 GB ATA-100) | 8.5 |
Maxtor | DiamondMax Plus D740X (80 GB ATA-133) | 8.0 |
Quantum | Atlas V (36.7 GB Ultra160/m SCSI) | 7.9 |
Quantum | Fireball Plus AS (60.0 GB ATA-100) | 7.7 |
Seagate | Barracuda 36ES (36 GB Ultra160/m SCSI) | 7.4 |
Seagate | Barracuda 180 (180 GB Ultra160/m SCSI) | 7.1 |
Maxtor | DiamondMax 536DX (100 GB ATA-100) | 6.9 |
Samsung | SpinPoint P20 (40.0 GB ATA-100) | 6.5 |
Seagate | U6 (80 GB ATA-100) | 6.3 |
The file and web server benchmarks (also available at [Ra01]) are not useful since they include 80 and 100 per cent read accesses, which is not really typical of MTA servers.
Some preliminary, very simple performance tests with Berkeley DB 4.0.14 have been made. Two benchmark programs have been used: bench_001 and bench_002 which use Btree and Queue as access methods. They are based on examples_c/bench_001.c that comes with Berkeley DB. Notice: the access method Queue requires fixed size records and the access methods is record numbers (simply increasing). This method may be used for the backup of the incoming EDB. Notice: the tests have not (yet) been run multiple times, at least not systematically. Testing showed that the runtimes may vary noticable. However, the data can be used to show some trends.
Possible parameters are:
-n N | number of records to write |
-T N | use transactions, synchronize after N transactions |
-l N | length of data part |
-C N | do a checkpoint every N actions and possibly remove logfile |
Unless otherwise noted, the following tests have been performed on system 1, see Section 5.2.1. Number of records is 100000 unless otherwise noted, t/s is transactions (records written) per second.
Vary synchronization (-T):
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 14.73 | 5.99 | 1.00 | 132 | 6788 |
1 | 10000 | 20 | 14.64 | 5.85 | 1.29 | 133 | 6830 |
1 | 1000 | 20 | 18.14 | 6.02 | 1.10 | 107 | 5512 |
1 | 100 | 20 | 70.57 | 6.03 | 1.76 | 27 | 1417 |
2 | 100000 | 20 | 11.58 | 2.91 | 0.74 | 168 | 8635 |
2 | 10000 | 20 | 10.14 | 2.86 | 0.85 | 192 | 9861 |
2 | 1000 | 20 | 11.20 | 2.85 | 0.95 | 174 | 8928 |
2 | 100 | 20 | 68.71 | 2.73 | 1.61 | 28 | 1455 |
Vary data length, first program only:
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 14.39 | 5.93 | 1.16 | 135 | 6949 |
1 | 10000 | 20 | 16.77 | 5.91 | 1.16 | 116 | 5963 |
1 | 1000 | 20 | 16.58 | 5.91 | 1.13 | 117 | 6031 |
1 | 100 | 20 | 68.10 | 5.95 | 1.85 | 28 | 1468 |
1 | 100000 | 100 | 23.30 | 5.57 | 1.90 | 419 | 4291 |
1 | 10000 | 100 | 30.56 | 5.56 | 1.90 | 319 | 3272 |
1 | 1000 | 100 | 33.39 | 5.51 | 1.99 | 292 | 2994 |
1 | 100 | 100 | 82.58 | 5.47 | 2.62 | 118 | 1210 |
1 | 100000 | 512 | 96.03 | 7.69 | 4.78 | 520 | 1041 |
1 | 10000 | 512 | 94.12 | 7.39 | 5.03 | 531 | 1062 |
1 | 1000 | 512 | 97.67 | 7.20 | 5.15 | 511 | 1023 |
1 | 100 | 512 | 164.13 | 7.51 | 5.67 | 304 | 609 |
1 | 100000 | 1024 | 304.88 | 10.88 | 10.62 | 327 | 327 |
1 | 10000 | 1024 | 270.00 | 10.69 | 10.66 | 370 | 370 |
1 | 1000 | 1024 | 275.27 | 10.91 | 11.06 | 363 | 363 |
1 | 100 | 1024 | 346.10 | 11.01 | 12.09 | 288 | 288 |
1 | 100000 | 2048 | 788.88 | 22.18 | 27.59 | 253 | 126 |
The test has been aborted at this point. Maybe run it again later on.
Vary data length, second program only:
Prg | -T | -l | real | user | sys | KB/s | t/s |
2 | 100000 | 20 | 9.46 | 2.81 | 0.80 | 206 | 10570 |
2 | 10000 | 20 | 11.53 | 2.88 | 0.81 | 169 | 8673 |
2 | 1000 | 20 | 12.47 | 2.83 | 0.96 | 156 | 8019 |
2 | 100 | 20 | 67.91 | 2.80 | 1.59 | 28 | 1472 |
2 | 100000 | 100 | 13.57 | 2.92 | 1.20 | 719 | 7369 |
2 | 10000 | 100 | 18.62 | 3.07 | 1.17 | 524 | 5370 |
2 | 1000 | 100 | 19.04 | 2.92 | 1.20 | 512 | 5252 |
2 | 100 | 100 | 72.73 | 2.80 | 2.16 | 134 | 1374 |
2 | 100000 | 512 | 46.10 | 3.90 | 2.61 | 1084 | 2169 |
2 | 10000 | 512 | 53.55 | 3.84 | 2.79 | 933 | 1867 |
2 | 1000 | 512 | 66.71 | 3.65 | 3.05 | 749 | 1499 |
2 | 100 | 512 | 105.25 | 3.36 | 3.76 | 475 | 950 |
2 | 100000 | 1024 | 103.72 | 4.92 | 4.68 | 964 | 964 |
2 | 10000 | 1024 | 105.53 | 4.87 | 4.82 | 947 | 947 |
2 | 1000 | 1024 | 105.60 | 4.73 | 4.85 | 946 | 946 |
2 | 100 | 1024 | 145.14 | 4.73 | 5.84 | 688 | 688 |
2 | 100000 | 2048 | 194.70 | 7.44 | 8.09 | 1027 | 513 |
2 | 10000 | 2048 | 197.09 | 7.22 | 8.15 | 1014 | 507 |
2 | 1000 | 2048 | 200.09 | 7.10 | 8.70 | 999 | 499 |
2 | 100 | 2048 | 234.85 | 6.86 | 9.53 | 851 | 425 |
Put the directory for logfiles on a different disk (/extra/home/ca/tmp/db), using Btree.
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 14.90 | 6.05 | 0.96 | 131 | 6711 |
1 | 10000 | 20 | 14.46 | 5.95 | 1.12 | 135 | 6915 |
1 | 1000 | 20 | 17.70 | 5.83 | 1.08 | 110 | 5649 |
1 | 100 | 20 | 63.91 | 5.92 | 1.74 | 30 | 1564 |
1 | 100000 | 100 | 27.00 | 5.53 | 1.90 | 361 | 3703 |
1 | 10000 | 100 | 33.39 | 5.63 | 1.92 | 292 | 2994 |
1 | 1000 | 100 | 29.16 | 5.63 | 1.75 | 334 | 3429 |
1 | 100 | 100 | 72.18 | 5.44 | 2.42 | 135 | 1385 |
1 | 100000 | 512 | 96.94 | 7.49 | 5.09 | 515 | 1031 |
1 | 10000 | 512 | 107.99 | 7.34 | 5.17 | 463 | 926 |
1 | 1000 | 512 | 97.05 | 7.21 | 5.54 | 515 | 1030 |
1 | 100 | 512 | 145.15 | 7.85 | 5.36 | 344 | 688 |
1 | 100000 | 1024 | 268.88 | 10.67 | 11.54 | 371 | 371 |
1 | 10000 | 1024 | 279.65 | 11.02 | 11.05 | 357 | 357 |
1 | 1000 | 1024 | 304.07 | 10.58 | 11.69 | 328 | 328 |
1 | 100 | 1024 | 319.74 | 10.88 | 12.10 | 312 | 312 |
1 | 100000 | 2048 | 738.38 | 23.07 | 27.13 | 270 | 135 |
1 | 10000 | 2048 | 651.86 | 22.70 | 26.92 | 306 | 153 |
1 | 1000 | 2048 | 693.13 | 21.79 | 28.63 | 288 | 144 |
1 | 100 | 2048 | 724.68 | 22.51 | 29.04 | 275 | 137 |
Put the directory for logfiles on a different disk (/extra/home/ca/tmp/db), using Queue.
Prg | -T | -l | real | user | sys | KB/s | t/s | |
2 | 100000 | 20 | 10.92 | 2.90 | 0.65 | 178 | 9157 | |
2 | 10000 | 20 | 9.94 | 2.87 | 0.77 | 196 | 10060 | |
2 | 1000 | 20 | 31.66 | 2.85 | 0.88 | 61 | 3158 | |
2 | 100 | 20 | 60.74 | 2.93 | 1.36 | 32 | 1646 | |
2 | 100000 | 100 | 13.62 | 3.09 | 0.95 | 717 | 7342 | |
2 | 10000 | 100 | 19.30 | 3.02 | 1.17 | 505 | 5181 | |
2 | 1000 | 100 | 15.55 | 3.16 | 1.08 | 628 | 6430 | |
2 | 100 | 100 | 71.88 | 2.97 | 1.72 | 135 | 1391 | |
2 | 100000 | 512 | 52.08 | 3.93 | 2.50 | 960 | 1920 | |
2 | 10000 | 512 | 52.42 | 3.68 | 3.03 | 953 | 1907 | |
2 | 1000 | 512 | 56.58 | 3.91 | 2.90 | 883 | 1767 | |
2 | 100 | 512 | 95.38 | 3.74 | 3.64 | 524 | 1048 | |
2 | 100000 | 1024 | 107.20 | 4.69 | 4.87 | 932 | 932 | |
2 | 10000 | 1024 | 100.15 | 4.88 | 4.57 | 998 | 998 | |
2 | 1000 | 1024 | 100.95 | 4.78 | 5.06 | 990 | 990 | |
2 | 100 | 1024 | 139.38 | 4.71 | 5.61 | 717 | 717 | |
2 | 100000 | 2048 | 187.78 | 7.68 | 8.41 | 1065 | 532 | |
2 | 10000 | 2048 | 189.76 | 7.09 | 8.62 | 1053 | 526 | |
2 | 1000 | 2048 | 201.95 | 7.37 | 8.65 | 990 | 495 | |
2 | 100 | 2048 | 217.66 | 7.21 | 9.53 | 918 | 459 |
Machine 2b: Vary data length, first program:
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 21.56 | 9.04 | 1.88 | 90 | 4638 |
1 | 10000 | 20 | 13.02 | 9.58 | 1.92 | 150 | 7680 |
1 | 1000 | 20 | 12.64 | 9.40 | 1.81 | 154 | 7911 |
1 | 100 | 20 | 16.35 | 9.68 | 1.73 | 119 | 6116 |
1 | 100000 | 100 | 32.79 | 9.16 | 4.60 | 297 | 3049 |
1 | 10000 | 100 | 25.05 | 9.54 | 4.11 | 389 | 3992 |
1 | 1000 | 100 | 23.69 | 9.80 | 4.39 | 412 | 4221 |
1 | 100 | 100 | 28.51 | 10.25 | 3.89 | 342 | 3507 |
1 | 100000 | 512 | 47.67 | 13.82 | 13.65 | 1048 | 2097 |
1 | 10000 | 512 | 48.04 | 13.22 | 13.64 | 1040 | 2081 |
1 | 1000 | 512 | 46.35 | 13.16 | 14.54 | 1078 | 2157 |
1 | 100 | 512 | 52.10 | 13.78 | 11.93 | 959 | 1919 |
1 | 100000 | 1024 | 109.32 | 21.59 | 25.00 | 914 | 914 |
1 | 10000 | 1024 | 107.94 | 19.97 | 26.49 | 926 | 926 |
1 | 1000 | 1024 | 108.74 | 20.13 | 26.06 | 919 | 919 |
1 | 100 | 1024 | 113.14 | 20.01 | 26.45 | 883 | 883 |
1 | 100000 | 2048 | 240.16 | 44.55 | 55.72 | 832 | 416 |
1 | 10000 | 2048 | 262.05 | 43.58 | 54.94 | 763 | 381 |
1 | 1000 | 2048 | 245.93 | 41.17 | 57.54 | 813 | 406 |
1 | 100 | 2048 | 254.97 | 41.39 | 59.63 | 784 | 392 |
Vary data length, second program:
Prg | -T | -l | real | user | sys | KB/s | t/s |
2 | 100000 | 20 | 9.85 | 5.92 | 1.30 | 198 | 10152 |
2 | 10000 | 20 | 7.82 | 5.90 | 1.28 | 249 | 12787 |
2 | 1000 | 20 | 7.21 | 5.13 | 1.34 | 270 | 13869 |
2 | 100 | 20 | 10.36 | 5.79 | 1.23 | 188 | 9652 |
2 | 100000 | 100 | 10.22 | 5.84 | 2.73 | 955 | 9784 |
2 | 10000 | 100 | 10.54 | 6.11 | 2.72 | 926 | 9487 |
2 | 1000 | 100 | 10.68 | 6.12 | 2.40 | 914 | 9363 |
2 | 100 | 100 | 13.57 | 6.06 | 2.37 | 719 | 7369 |
2 | 100000 | 512 | 23.73 | 7.32 | 8.89 | 2107 | 4214 |
2 | 10000 | 512 | 25.36 | 7.42 | 8.44 | 1971 | 3943 |
2 | 1000 | 512 | 26.12 | 7.19 | 8.56 | 1914 | 3828 |
2 | 100 | 512 | 33.79 | 7.24 | 8.78 | 1479 | 2959 |
2 | 100000 | 1024 | 47.93 | 9.05 | 12.29 | 2086 | 2086 |
2 | 10000 | 1024 | 52.26 | 9.63 | 14.91 | 1913 | 1913 |
2 | 1000 | 1024 | 52.07 | 9.37 | 14.50 | 1920 | 1920 |
2 | 100 | 1024 | 58.91 | 9.49 | 14.52 | 1697 | 1697 |
2 | 100000 | 2048 | 74.59 | 15.42 | 20.55 | 2681 | 1340 |
2 | 10000 | 2048 | 72.47 | 14.99 | 21.50 | 2759 | 1379 |
2 | 1000 | 2048 | 78.38 | 14.54 | 21.93 | 2551 | 1275 |
2 | 100 | 2048 | 76.63 | 14.01 | 22.12 | 2609 | 1304 |
Machine 7: Vary data length, second program only; the times for a second test run are added on the right, these clearly show how wildly the results can vary.
Prg | -T | -l | real | user | sys | KB/s | t/s |
2 | 100000 | 20 | 5.20 | 2.00 | 0.30 | 375 | 19230 |
2 | 10000 | 20 | 6.20 | 2.00 | 0.30 | 315 | 16129 |
2 | 1000 | 20 | 7.10 | 2.00 | 0.30 | 275 | 14084 |
2 | 100 | 20 | 25.80 | 2.00 | 0.50 | 75 | 3875 |
2 | 100000 | 100 | 6.30 | 2.10 | 0.60 | 1550 | 15873 |
2 | 10000 | 100 | 6.50 | 2.10 | 0.60 | 1502 | 15384 |
2 | 1000 | 100 | 10.60 | 2.20 | 0.60 | 921 | 9433 |
2 | 100 | 100 | 36.50 | 2.10 | 0.80 | 267 | 2739 |
2 | 100000 | 512 | 33.40 | 2.70 | 2.80 | 1497 | 2994 |
2 | 10000 | 512 | 29.80 | 2.70 | 2.90 | 1677 | 3355 |
2 | 1000 | 512 | 29.30 | 2.60 | 2.50 | 1706 | 3412 |
2 | 100 | 512 | 65.90 | 2.60 | 2.90 | 758 | 1517 |
2 | 100000 | 1024 | 50.50 | 3.30 | 4.90 | 1980 | 1980 |
2 | 10000 | 1024 | 60.80 | 3.40 | 5.40 | 1644 | 1644 |
2 | 1000 | 1024 | 51.70 | 3.30 | 4.60 | 1934 | 1934 |
2 | 100 | 1024 | 89.70 | 3.20 | 5.60 | 1114 | 1114 |
2 | 100000 | 2048 | 90.20 | 4.40 | 8.90 | 2217 | 1108 |
2 | 10000 | 2048 | 92.80 | 4.30 | 9.10 | 2155 | 1077 |
2 | 1000 | 2048 | 93.50 | 4.60 | 7.80 | 2139 | 1069 |
2 | 100 | 2048 | 134.00 | 4.40 | 7.50 | 1492 | 746 |
real | user | sys | ||||
5.0 | 2.0 | 0.3 | ||||
4.8 | 1.9 | 0.3 | ||||
7.8 | 2.0 | 0.3 | ||||
28.5 | 2.0 | 0.5 | ||||
6.0 | 2.0 | 0.6 | ||||
7.6 | 2.1 | 0.6 | ||||
11.5 | 2.0 | 0.6 | ||||
31.4 | 2.1 | 0.8 | ||||
18.5 | 2.6 | 2.0 | ||||
24.6 | 2.5 | 2.3 | ||||
32.9 | 2.5 | 2.8 | ||||
61.0 | 2.5 | 2.8 | ||||
58.4 | 3.2 | 5.5 | ||||
47.2 | 3.2 | 4.6 | ||||
45.1 | 3.2 | 4.2 | ||||
82.0 | 3.2 | 4.9 | ||||
86.9 | 4.3 | 9.1 | ||||
67.1 | 4.3 | 7.3 | ||||
66.0 | 4.2 | 7.0 | ||||
107.7 | 4.2 | 6.5 |
Vary data length, first program only:
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 6.90 | 3.10 | 0.40 | 283 | 14492 |
1 | 10000 | 20 | 7.20 | 3.30 | 0.50 | 271 | 13888 |
1 | 1000 | 20 | 9.90 | 3.30 | 0.50 | 197 | 10101 |
1 | 100 | 20 | 28.90 | 3.20 | 0.60 | 67 | 3460 |
1 | 100000 | 100 | 11.30 | 3.40 | 1.00 | 864 | 8849 |
1 | 10000 | 100 | 12.20 | 3.30 | 1.00 | 800 | 8196 |
1 | 1000 | 100 | 14.00 | 3.30 | 1.10 | 697 | 7142 |
1 | 100 | 100 | 35.80 | 3.30 | 1.30 | 272 | 2793 |
1 | 100000 | 512 | 37.10 | 4.50 | 4.20 | 1347 | 2695 |
1 | 10000 | 512 | 50.00 | 4.60 | 4.50 | 1000 | 2000 |
1 | 1000 | 512 | 62.50 | 4.50 | 4.60 | 800 | 1600 |
1 | 100 | 512 | 68.60 | 4.50 | 4.60 | 728 | 1457 |
1 | 100000 | 1024 | 86.20 | 6.20 | 8.70 | 1160 | 1160 |
1 | 10000 | 1024 | 117.10 | 6.00 | 8.40 | 853 | 853 |
1 | 1000 | 1024 | 78.90 | 6.10 | 7.80 | 1267 | 1267 |
1 | 100 | 1024 | 109.60 | 6.10 | 7.40 | 912 | 912 |
1 | 100000 | 2048 | 225.80 | 10.90 | 15.90 | 885 | 442 |
1 | 10000 | 2048 | 259.40 | 10.80 | 16.30 | 771 | 385 |
1 | 1000 | 2048 | 382.60 | 10.90 | 17.40 | 522 | 261 |
1 | 100 | 2048 | 394.30 | 10.90 | 17.20 | 507 | 253 |
Machine 10a:
Prg | -T | -l | real | user | sys | KB/s | t/s |
1 | 100000 | 20 | 5.00 | 4.40 | 0.50 | 390 | 20000 |
1 | 10000 | 20 | 5.00 | 4.30 | 0.60 | 390 | 20000 |
1 | 1000 | 20 | 5.50 | 4.40 | 0.80 | 355 | 18181 |
1 | 100 | 20 | 9.00 | 4.50 | 3.90 | 217 | 11111 |
1 | 100000 | 100 | 6.10 | 4.70 | 1.20 | 1600 | 16393 |
1 | 10000 | 100 | 6.20 | 4.80 | 1.20 | 1575 | 16129 |
1 | 1000 | 100 | 6.70 | 4.60 | 1.80 | 1457 | 14925 |
1 | 100 | 100 | 10.90 | 5.00 | 4.30 | 895 | 9174 |
1 | 100000 | 512 | 13.30 | 6.50 | 5.10 | 3759 | 7518 |
1 | 10000 | 512 | 12.90 | 6.90 | 4.80 | 3875 | 7751 |
1 | 1000 | 512 | 14.00 | 7.00 | 5.00 | 3571 | 7142 |
1 | 100 | 512 | 19.00 | 7.10 | 8.40 | 2631 | 5263 |
1 | 100000 | 1024 | 19.70 | 8.80 | 8.40 | 5076 | 5076 |
1 | 10000 | 1024 | 19.30 | 9.20 | 8.20 | 5181 | 5181 |
1 | 1000 | 1024 | 19.90 | 9.20 | 8.70 | 5025 | 5025 |
1 | 100 | 1024 | 26.70 | 9.20 | 12.30 | 3745 | 3745 |
1 | 100000 | 2048 | 32.90 | 13.80 | 11.70 | 6079 | 3039 |
1 | 10000 | 2048 | 31.10 | 13.80 | 12.10 | 6430 | 3215 |
1 | 1000 | 2048 | 34.90 | 14.40 | 12.30 | 5730 | 2865 |
1 | 100 | 2048 | 41.30 | 14.10 | 16.10 | 4842 | 2421 |
Prg | -T | -l | real | user | sys | KB/s | t/s |
2 | 100000 | 20 | 4.70 | 4.20 | 0.30 | 415 | 21276 |
2 | 10000 | 20 | 4.70 | 4.00 | 0.50 | 415 | 21276 |
2 | 1000 | 20 | 5.20 | 4.20 | 0.70 | 375 | 19230 |
2 | 100 | 20 | 8.80 | 4.10 | 3.90 | 221 | 11363 |
2 | 100000 | 100 | 5.50 | 4.30 | 0.80 | 1775 | 18181 |
2 | 10000 | 100 | 5.70 | 4.30 | 0.80 | 1713 | 17543 |
2 | 1000 | 100 | 6.20 | 4.50 | 1.00 | 1575 | 16129 |
2 | 100 | 100 | 9.70 | 4.50 | 4.20 | 1006 | 10309 |
2 | 100000 | 512 | 12.50 | 5.50 | 2.30 | 4000 | 8000 |
2 | 10000 | 512 | 13.60 | 5.40 | 2.60 | 3676 | 7352 |
2 | 1000 | 512 | 11.70 | 5.10 | 3.30 | 4273 | 8547 |
2 | 100 | 512 | 14.50 | 5.70 | 6.40 | 3448 | 6896 |
2 | 100000 | 1024 | 17.90 | 6.80 | 3.90 | 5586 | 5586 |
2 | 10000 | 1024 | 17.30 | 6.70 | 4.60 | 5780 | 5780 |
2 | 1000 | 1024 | 18.40 | 6.60 | 4.60 | 5434 | 5434 |
2 | 100 | 1024 | 19.00 | 7.00 | 8.10 | 5263 | 5263 |
2 | 100000 | 2048 | 24.80 | 8.80 | 6.90 | 8064 | 4032 |
2 | 10000 | 2048 | 21.20 | 9.00 | 6.80 | 9433 | 4716 |
2 | 1000 | 2048 | 20.90 | 9.10 | 7.20 | 9569 | 4784 |
2 | 100 | 2048 | 24.00 | 8.90 | 11.30 | 8333 | 4166 |
General notice: the benchmark programs have been run while the machines were ``in use'', so some unusual results can be explained by the activity of other processes.
Comments:
2004-03-26 Effect of logging using logging to files via sm I/O.
On FreeBSD 4.9, UFS, softupdates, SCSI, smX.0.0.12, relay 5000 messages, 100 threads:
logging | time |
same disk, smioerr | 137-141 |
same disk, smioout | 104 |
RAM, smioerr | 104 |
This means there is a performance hit of about 35 per cent if smioerr is used instead of smioout. The former uses line buffering, hence there are more writes involved.
The program checks/t-net-0.c can be used for very simple performance test of local TCP/IP (AF_INET or AF_LOCAL) communication. This function uses the SM I/O layer on top of sockets. Some of the options which are available are listed below:
-b n | set buffer size to n, default 8192 |
-c n | act as client, write n bytes |
-s n | act as server, read n bytes |
-R n | read and write n times |
-u | use a Unix domain socket |
The numbers reference the machines listed in Section 5.2.1.1.
1:
-R | -c | time INET | time LOCAL |
100000 | 32 | 15 | 10 |
100000 | 64 | 19 | 12 |
100000 | 128 | 24 | 17 |
100000 | 256 | 31 | 25 |
100000 | 512 | 49 | 43 |
100000 | 1024 | 84 | 81 |
7:
-R | -c | time INET | time LOCAL |
100000 | 32 | 10 | 7 |
100000 | 64 | 11 | 7 |
100000 | 128 | 13 | 9 |
100000 | 256 | 17 | 14 |
100000 | 512 | 23 | 20 |
100000 | 1024 | 37 | 35 |
8:
-R | -c | time INET | time LOCAL |
100000 | 32 | 51 | 41 |
100000 | 64 | 57 | 46 |
100000 | 128 | 66 | 60 |
100000 | 256 | 97 | 86 |
100000 | 512 | 148 | 138 |
100000 | 1024 | 250 | 243 |
9:
-R | -c | time INET | time LOCAL |
100000 | 32 | 67 | 52 |
100000 | 64 | 74 | 59 |
100000 | 128 | 85 | 71 |
100000 | 256 | 114 | 97 |
100000 | 512 | 159 | 148 |
100000 | 1024 | 263 | 246 |
11:
-R | -c | time INET | time LOCAL |
100000 | 32 | 99 | 89 |
100000 | 64 | 108 | 94 |
100000 | 128 | 138 | 115 |
100000 | 256 | 141 | 151 |
100000 | 512 | 199 | 221 |
100000 | 1024 | 346 | 373 |
Notice: these times vary wildly since the machine is used by several people.
13:
-R | -c | time INET | time LOCAL |
100000 | 32 | 46 | 38 |
100000 | 64 | 59 | 48 |
100000 | 128 | 78 | 70 |
100000 | 256 | 120 | 110 |
100000 | 512 | 203 | 192 |
100000 | 1024 | 376 | 358 |
Just a preliminary number: On system 1 the example program examples_c/bench_001.c achieves about 1 to 1.5 millions lookups per second (this is for a data length of 20 bytes and a cache size of 64MB, which more or less means direct memory access, no disk I/O). Taking the results from 5.4.1 into account means that this is factor of 100 faster than performing lookups over a generic TCP/IP connection. This certainly must be taken into account for the decision how and where to incorporate DB lookups.
Using larger data sizes (256 to 512 bytes) and smaller caches (10000 bytes) cause a significant drop in performance: a sequential lookup of all data varies from 60000 to 10000 lookups per second (on system 1).
Random access goes down as much as 1000 to 2000 lookups per second.
On AIX, sm_snprintf() is about 2 times slower than snprintf(). On SunOS 5.8 it's about 1.3, on FreeBSD there is no difference (which isn't surprising since it's almost the same code). It might make sense to use the native snprintf() version on some platforms, however, this isn't possible anymore due to the extensions in sm_snprintf() (which supports more format specifiers, e.g., for constant strings).