next up previous contents
Next: Bibliography Up: Sendmail X Previous: Sendmail X: Implementation   Contents

Subsections

Sendmail X: Performance Tests and Results


SMTP Server Daemon

Remark (placed here so it doesn't get lost): there is a restricted number ($<$ 60000) of possible open connections to one port. Could that limit the throughput we are trying to achieve or is such a high number of connections unfeasible?


SMTP Sink

For simple performance comparisons several SMTP sinks have been implemented or tested.

Test programs are:

  1. smtp-sink from postfix. This is an entirely event driven program.
  2. thrperconn: one thread per connection.
  3. thrpool: uses a worker model with concurrency limiting, see Section 3.20.4.1.
  4. smtps: state-threads, see Section 3.20.3.1.

Test machines are:

  1. v-sun: Sun SPARCserver E450, 4 processors
  2. v-bsd: FreeBSD 3.4, 2 PIII processors, 2 GB RAM
  3. v-aix: AIX, 4 processors
  4. schmidt: Linux 2.4, uses 15 threads per client only, otherwise the machine just ``dies''.

Entries in the tables down below denote execution time in seconds unless otherwise noted, hence smaller values are better.

Tests have been performed with myslam (a multi-threaded SMTP client), using 7 to 8 client machines, 50 threads per client, and 5000 messages per client.

  1. v-sun (8 clients):

    parameters smtp-sink smtps thrperconn thrpool
    1KB/msg (40MB) 45s 70s 92s 43s
    4KB/msg (160MB) 49s 56s 259s 78s
    32KB/msg (1280MB) 203s 208s $>$999s 110s
    -w 1 141s 109s 156s 230s

    Note: v-sun is a four processor machine, hence the multi-threaded programs (thrpool, thrperconn) can use multiple processors. I didn't select (via an option) multiple processors for smtps though.

    Just as one example, the achieved throughput in MB/s is listed in the next table. As it can be seen, it is an order of magnitude lower than the sustainable throughput that can be achieved over a single connection (about 85-90MB/s measured with ttcp; this is a 100Mbit/s ethernet).

    parameters smtp-sink smtps thrperconn thrpool
    1KB/msg (40MB) 0.9 0.6 0.4 0.9
    4KB/msg (160MB) 3.3 2.9 0.6 2.1
    32KB/msg (1280MB) 6.5 6.3 - 11.9

  2. v-bsd:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 97 87 380 140
    4KB msg size 108 130 1150 156
    32KB msg size 208 197 fails 330
    -w 1 165 138 484 223

  3. v-aix:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 38 28 - 31
    4KB msg size 34 33 - 31
    32KB msg size 125 125 - 125
    -w 1 125 125 - 155
            125 for 250/3

  4. schmidt:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 45 44 165 74
    4KB msg size 54 45 418 75
    32KB msg size 217 167 fails 256
    -w 1 370 360 - 337


SMTP Sink with CDB

2004-03-02

statethreads/examples/smtps3

wiz

See Section 5.2.1.1, machine 1

wiz$ time ./smtpc2 -fa@b.c -Rx@y.z -t 100 -s 1000 -r localhost

sink program FS times (s)
smtps3 - 5
smtpss UFS 17, 18
smtps3 -C UFS 16, 17, 19

perf-lab

source: s-6.perf-lab

sink: v-bsd.perf-lab

with -C

s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab
   19.17s real     1.08s user     0.64s system

without -C

s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab
    3.04s real     0.81s user     0.59s system

source: s-6.perf-lab

sink: mon.perf-lab (FreeBSD 4.9)

with -C

   12.05s real     1.04s user     0.67s system

without -C

    3.03s real     0.92s user     0.54s system

2004-03-04 source: s-6.perf-lab; sink: v-sun.perf-lab

with -C: 20s - 24s (UFS) Note: It takes 20s(!) to remove all CDB files:

time rm ?/S*     0m20.11s
with -C: 1s (TMPFS); 16s (UFS, /), rm: 14s; logging turned on: 16s, rm: 0.8s.

without -C: 1s

2004-03-08 source: s-6.perf-lab; sink: v-bsd;

./smtpc -t 100 -s 1000

sink program time (s)
smtpss 30
smtps3 -C 30
smtps3 3

2004-03-08 source: s-6.perf-lab; sink: v-sun;

./smtpc -t 100 -s 1000

sink program FS times (s)
smtps3 - 1
smtpss UFS 25, 30
smtps3 -C UFS 23
smtpss swap 2, 3
smtps3 -C swap 1, 2

Note: the variance for smtpss on UFS is fairly large. The lower numbers are achieved by running smtps3 -C first and then smtpss, the larger numbers are measured when the CDB files have just been removed. However, this effect was not reproduceable. Note: removing those files takes about as long as a test run.


SMTP Relaying Using a Sendmail X Prototype

Test setup with a sendmail X prototype of 2002-09-04: v-aix.perf-lab running QMGR, SMTPS, and SMTPC. Relaying from localhost to v-bsd.perf-lab. Source program running on v-aix:

time ./smtp-source -s 50 -m 100 -c localhost:8000

Using the full version: 2.45s; turning fsync() off: 1.44s.

This clearly shows the need for a better CDB implementation, at least on AIX.

Same test with reversed roles (smX on v-bsd, sink on v-aix): using the full version: 7.44s; turning fsync() off: 6.20s. For comparison: using sendmail 8.12: 14.71s.

The SCSI disks on v-bsd seem to be fairly slow. Moreover, there seems to be something wrong with the OS version (it's very old: FreeBSD 3.4).

On FreeBSD 4.6 (machine 14, see Section 5.2.1.1) (source, sink, sm-9 of 2002-10-01 on the same machine):

time ./smtp-source -s 100 -m 200 -c localhost:8000

softupdates: 4.35s; without softupdates: 5.66s

time ./smtp-source -s 50 -m 100 -c localhost:8000

softupdates: 2.01s/1.93s, -U: 1.79s; without softupdates: 2.60s/2.46s, -U: 2.17s

(-U turns off fsync()).

Using sendmail 8.12.6:

time ./smtp-source -s 50 -m 100 localhost:1234

softupdates: 5.01s. This looks quite good for sendmail 8, but the result for:

time ./smtp-source -c -s 100 -m 200 localhost:1234

is: 143.12s, which certainly is not anywhere near good. This is related to the high load generated by this: up to 200 concurrent sendmail processes just kill the machine. sendmail X has only up to 4 processes running.


Various Linux FS

Test date: 2003-05-25, version: smX.0.0.6, machine: PC, AMD Duron 700MHz, 512MB RAM, SuSE 8.1

Test program:

time ./smtp-source -s 50 -m 500 -fa@b.c -tx@y.z localhost:1234

FS Times msg/s (best)
JFS 4.02s, 4.23s 124
ReiserFS 4.8s 104
XFS 6.7s, 7.2s, 7.48s, 7.64s 74
EXT3 14.39s, 13.44s 34

2004-03-17 checks/t-readwrite on destiny (Linux, IDE, ext2):

parameters writes time
-s -f 1000 -p 1 - 9
-s -f 100 -p 10 - 6

The FS is mounted async (default!).

2004-03-17 checks/t-readwrite on ia64-2 (Linux, SCSI, reiserfs):

parameters writes time
-s -f 1000 -p 1 - 5.2
-s -f 100 -p 10 - 2.6

2004-03-23 source: basil.ps-lab MTA: cilantro.ps-lab (Linux 2.4.18-64GB-SMP) sink: v-sun.perf-lab

FS: ReiserFS version 3.6.25

smtpc -t 100 -s 1000

program source time sink time
smtps3 -C   -
smX.0.0.12 6 5
sm8.12.11 74 74
sm8.12.11 See 1   50
postfix 2.0.18    

gatling -m 100 -c 5000 -z 1 -Z 1

program writes source time source msgs/s sink time
smtps3   2 2295 -
smtps3 -C   5 962 -
smX.0.0.12   22 225 22
sm8.12.11   358 14 358
sm8.12.11 See 1   246 20 -
postfix 2.0.18        

Notes:

  1. Default for Linux is to have REQUIRES_DIR_FSYNC set, in this test it has been turned off. Some people claim it is safe to do that with recent Linux FSs. For some reasons (timeouts?) the tests with smtpc fail in this configuration, i.e., less than 1000 messages are sent.
  2. According to tests by Thom sendmail 8.12 was able to relay 40 msgs/s on the same machine.

2004-03-25:

Filesystems:

  1. ext3 (rw,sync,data=journal)
  2. ext3 (rw,data=journal) [this means async?]
  3. reiserfs (rw,noatime,data=journal,notail)
  4. jfs (rw)
  5. ext2 (rw,sync)

smtpc -t 100 -s 1000

program FS source time sink time
smX.0.0.12 1 63 61
  1 63 63
  2 19 18
  3 5 4
  3 5 5
  5 81 80
sm8.12.11 3 45 several read errors
  5 91 92
smtps3 -C      

2004-03-25: gatling -m 100 -c 5000 -z 1 -Z 1 (1KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2 90 90 55
  3 24 24 208
  4 100 99 100
sm8.12.11 3 216 errors 23

gatling -m 100 -c 5000 -z 4 -Z 4 (4KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2 92 92 54
  3 141 140 35
  4 168 168 29
sm8.12.11 3 226 errors 22

gatling -m 100 -c 5000 -z 16 -Z 16 (16KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2      
  3 169   29
  4      
sm8.12.11 3 226 errors 22

Notes:

  1. ReiserFS seems to have some optimizations for small files, hence the results for 1KB are really good, but for 4KB they are in the normal range.
  2. Testing with sm8 usually caused several read errors on the sink side and several errors displayed by gatling.


Various FreeBSD Results

2003-11-19 sm-9.0.0.9 running on v-bsd.perf-lab (2 processors, FreeBSD 3.4)

Source on bsd.dev-lab

time ./smtp-source -d -s 100 -m 500

directly to sink: 2.16 - 2.74s (231msgs/s)

using MFS: 14.37 - 14.43s (34msgs/s) (sm8.12.10: 32s)

using FS with softupdates: 22.78 - 23.83s (21msgs/s) (sm8.12.10: 49s)

using FS without softupdates: 35.27 - 35.56s (14msgs/s)

2004-03-02 source: s-6.perf-lab; relay: mon; sink: v-bsd

time ./smtpc2 -O 10 -fa@s-6.perf-lab -Rnobody@v-bsd.perf-lab -t 100 -s 1000 -r mon.perf-lab:1234

38.26s real 1.01s user 0.88s system

2004-03-04 source: s-6.perf-lab; relay: v-bsd; sink: v-sun

options: -t 100 -s 1000

MTA source time(s) sink time
postfix 2.0.18 53 94
smX.0.0.12 69 68
without smtpc 56 -
sm8.12.11 67 67
-odq 79, 82  
-odq / 100 qd 101  
-odq / 10 qd 100  

Note: this is FreeBSD 3.4 without softupdates and directory hashes.

getrusage(2) data:

sm8.12.11 -odq

ru_utime=        15.0158488
ru_stime=        71.0104605
ru_maxrss=     1524
ru_ixrss=   5030592
ru_idrss=   4098456
ru_isrss=   1412096
ru_minflt=   127503
ru_majflt=        0
ru_nswap=         0
ru_inblock=       0
ru_oublock=   11851
ru_msgsnd=    13000
ru_msgrcv=    10000
ru_nsignals=      0
ru_nvcsw=    617469
ru_nivcsw=    18793

sm8.12.11

ru_utime=        15.0236311
ru_stime=        62.0117941
ru_maxrss=     1520
ru_ixrss=   4573224
ru_idrss=   3676784
ru_isrss=   1283712
ru_minflt=   174619
ru_majflt=        0
ru_nswap=         0
ru_inblock=       0
ru_oublock=    4001
ru_msgsnd=    12000
ru_msgrcv=    10000
ru_nsignals=   1000
ru_nvcsw=    128074
ru_nivcsw=    14771

This looks like a problem in queue only mode: there's way too much data written: almost 3 times the amount of background delivery mode. Why does sm8 send 1000 more message in queue only mode?

2004-03-05 source, relay, sink: wiz (FreeBSD 4.8)

options: -t 100 -s 1000

source: 34s, sink: 32s

turn off smtpc: source: 31s, 34s

2004-03-26 source: v-6.perf-lab running smtpc -t 100 -s 5000; relay: v-bsd.perf-lab; sink: v-sun.perf-lab

sink runs smtps2 -R n with varying values for n

n source time requests served
0 108 5000
8000000 115 5060
58000000 140 5450
88000000 151 5620

put defedb on a RAM disk:

n source time requests served
0 108 5000
8000000    
58000000 111 5453
88000000 114 5693

Obviously the additional disk I/O traffic created by having to use DEFEDB is slowing down the system.


FreeBSD 4.9, Softupdates, and fsync()

2004-06-23 Upgraded v-bsd.perf-lab to FreeBSD 4.9 (2 processors), using softupdates.

source on v-sun, sink on s-6:

time ./smtpc2 -O 10 -t 100 -s 1000 -r v-bsd.perf-lab:1234

43s

turn off fsync(): (smtps -U, must be compiled with -DTESTING)

32s


Disk I/O On FreeBSD

A modified iostat(8) program is used to show the number of bytes written and read, and the number of read, write, and other disk I/O operations.

The following tests were performed: sink (smtps3) on v-bsd.perf-lab, source (smtpc) on s-6.perf-lab sending 1000 mails. All numbers for write operations are rounded; if there are numbers in parentheses then those denote the value of ru_oublock (getrusage(2)) for smtps/qmgr or sm8. If two times are given (separated by /) then the second time denotes the output (elapsed time) for the sink.

program softupdates? writes reads time
smtps3 -C yes 2200 - 14
smtps3 -C no 2900 - 30
smX.0.0.12, no sched (see 1) yes 5200 - 34
smX.0.0.12, no sched yes   -  
smX.0.0.12, no sched no   -  
smX.0.0.12 (see 2) yes 3500 (2000/1300) 4 33
  yes 3370 (2020/1270) 4 30/29
-O i=1000000 yes 2660 (1850/660) 0 25/24
smX.0.0.12 no 6300 (3000/3200) 0 52
smX.0.0.12 (see 4) yes 3500 (2200/1200) 4 25
sm8.12.11 -odq SS=m yes 1800 - 41
sm8.12.11 -odq SS=m no 12200 - 72
sm8.12.11 SS=m (see 3) yes 236 (164) 0 61
  yes 370 (218) 0 60
sm8.12.11 no 8100 (4100) 1 63
sm8.12.11 SS=t yes 7400 0 70
postfix 2.0.18 yes 2900 16 21/26

Notes:

  1. Question: why does the smX.0.0.12 use so many write operations? 5200 is way too much. Answer: qmgr committed IBDB more than 1000 times5.1, increasing the maximum time to acknowledge an SMTPS transaction from 100 $\mu$s to 10000 $\mu$s reduces the number of commits to 165.
  2. Question: why does qmgr still (after increasing the time between commits) perform so many write operations?
  3. Question: why does sm8 use so few writes? Can softupdates eliminate or cluster most writes? Why doesn't this work for smX? Solution: SuperSafe was set to m, not to true.
  4. If IBDB and CDB are on different partitions, the performance increases significantly (about 25 per cent faster).

2004-03-23 source: basil.ps-lab MTA: wasabi.ps-lab (FreeBSD 4.9, machine 16 in Section 5.2.1.1) sink: v-sun.perf-lab

smtpc -t 100 -s 1000

program writes reads source time sink time
smtps3 -C 2400 - 11 -
smX.0.0.12 2600 5 15 13
sm8.12.11 6000 1 35  
postfix 2.0.18 2800 15 14 20

Note: the sink time for postfix is shorter than the time for smX because smX emptied the queue during the run while postfix has more than 700 entries in the mail queue after the source finished sending all mails. This can be seen by looking at the sink time which is noticeable larger for postfix compared to sendmail X.

Using gatling:

Max random envelope rcpts:  1
Connections:                100
Max msgs/conn:              Unlimited
Messages:                   Fixed size 1 Kbytes
Desired Message Rate:       Unlimited
Total messages:             5000

Total test elapsed time: 73.571 seconds (1:13.570)
Overall message rate: 67.962 msg/sec
Peak rate: 100.000 msg/sec

gatling -m 100 -c 5000 -z 1 -Z 1

program writes source time source msgs/s sink time
smtps3 0 5 980 -
smtps3 -C 11750 53 93  
smX.0.0.12   73 67 71
smX.0.0.12 11157 (8000/2700) 70 71 69
sm8.12.11   136 36  
postfix 2.0.18   60 83 78
postfix 2.0.18 12635 58 85 75

2004-03-16 results for wiz: source: time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: smtps3, file system: UFS, softupdates

parameters oublock writes source time sink time
-C -i 1920 ? 17 16
-C -p 1 1860 ? 17 17
-C -p 1 1940 2700 16 15
-C -p 1 1970 2770 16 15
-C -p 2   ? 15 ?
-C -p 2 877+966 2600 15 ?
-C -p 4 455+476+432+472 2640 15 ?

New option: -f for flat, i.e., instead of using 16 subdirectories for CDB files, a single directory is used. Even though this does not cause a noticeable difference in run time, the number of I/O operations is reduced.

parameters oublock writes source time
-C -p 2 915+920 2600 14
-C -p 2 -f 600+610 2200 14

2004-03-16 source: s-6.perf-lab, time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: -v-bsd.perf-lab, smtps3, file system: UFS, softupdates

parameters oublock writes source time sink time
-C -i 1430 2165 12 11
  1550 2300 14 13
-C -p 1 1500 2500 14 12
-C -p 2 1100+620 2500 13 -
  800+770 2320 13 -
-C -p 4 530+350+540+470 2600 13 -

Note: some of the write operations might be from softupdates due to the previous rm command (removing the CDB files).

2004-03-17 checks/t-readwrite on v-bsd (FreeBSD 4.9, SCSI):

parameters softupdates oublock writes time
-s -f 1000 -p 1 yes 4000 4000 22
-s -f 100 -p 10 yes 2575 2579 14
-s -f 1000 -p 1 no 4050 4050 28
-s -f 100 -p 10 no 4050 4050 27

-p specifies the number of processes to start, -f specifies the number of files to write per process. The test cases above write 1000 files with either 1 or 10 processes. As it can be seen, it is significantly more efficient to use 10 processes if softupdates are turned on.

2004-03-17 checks/t-readwrite on wiz (FreeBSD 4.8, IDE):

parameters softupdates oublock writes time
-s -f 1000 -p 1 yes 3000 3800 13
-s -f 100 -p 10 yes 2860 3600 13

In this case no difference can be seen, which is most likely a result of using an IDE drive with write-caching turned on (default).


Various SunOS 5 Results

2003-11-21 sm-9.0.0.9 running on v-sun.perf-lab

Source on bsd.dev-lab

time ./smtp-source -d -s 100 -m 5000 -c

using FS: 301.90 - 305.02s (16msgs/s)

using swap: 77.98 - 78.55s (64msgs/s)

Those tests ran only 32 SMTPS threads (the machine has 4 CPUs, hence the specified limit 128 was divided by 4). Using 128 SMTPS threads (by forcing only one process which was used anyway because SMTPS is run with the interactive option which does not start backgroup processes):

time ./smtp-source -d -s 100 -m 50000 -c

using swap: 727.73s (68msgs/s)

2004-03-09 sm-9.0.0.12 running on v-sun.perf-lab

time ./smtpc -O 20 -fa@s-6.perf-lab.sendmail.com -Rnobody@v-bsd.perf-lab.sendmail.com -t 100 -s 1000 -r v-sun.perf-lab.sendmail.com:1234

MTA options FS source time(s) sink time(s)
full MTS SWAPFS 16 14
without sched SWAPFS 10 -
smtpss SWAPFS 3 -
full MTS UFS 64, 65, 64 75, 70, 69
8.12.11 SWAPFS 16 19
8.12.11 UFS 141 138

Note: smX using UFS runs into connection limitations: QMGR believes there are 100 open connections even though the sink shows at most 18. This seems to be a communication latency between SMTPC and QMGR (and needs to be investigated further).

2004-03-17 checks/t-readwrite on v-sun (SunOS 5.8, SCSI):

parameters writes time
-s -f 1000 -p 1 - 39
-s -f 100 -p 10 - 37

The filesystem on SunOS 5.8 does not cause any difference whether 1 or 10 processes are used.


Various OpenBSD Results

2004-03-05 source, relay, and sink on zardoc (OpenBSD 3.2)

test with logging via smioout

zardoc$ time ./smtpc2 -O 10 -s 1000 -t 100 -r localhost:1234
   24.17s real     0.94s user     2.57s system

smtps3 stats:

elapsed                    26
Thread limits (min/max)    8/256
Waiting threads            8
Max busy threads           3
Requests served            1000

Note that there have been only 3 active threads. That means the client is not busy at all. Another test shows elapsed=23s, max busy threads=21, so the result isn't deterministic (the machine is running as normal SMTP server etc during tests).

test with logging via smioerr: smtpc2: 24.53s; no difference.


Various AIX Results

2004-03-17 checks/t-readwrite on aix-3 (AIX 4.3, SCSI, jfs):

parameters writes time
-s -f 1000 -p 1 - 30
-s -f 100 -p 10 - 29

No (noticeable) difference.


Implementation of Queues and Caches


Filesystem Performance

Here are some results of a simple test program which creates and deletes a number of files and optionally renames them twice while doing so.

Notice: unless mentioned otherwise, all measurements are at most accurate to one second resolution. Repeated test will most likely show (slightly) different results. These tests are only listed to give an idea of the magnitude of available performance.


Test Systems

The involved systems are:

  1. PC, Pentium III, 500MHz, 256MB RAM, FreeBSD 3.2,
    wdc0: unit 0 (wd0): <FUJITSU MPD3064AT>
    wd0: 6187MB (12672450 sectors), 13410 cyls, 15 heads, 63 S/T, 512 B/S
    

  2. PC, AMD K6-2, 450MHz, 220MB RAM, OpenBSD 2.8,
    1. wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-351010>
      wd0: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4
      wd0: 16-sector PIO, LBA, 9671MB, 16383 cyl, 16 head, 63 sec, 19807200 sectors
      

    2. wd1 at pciide0 channel 0 drive 1: <Maxtor 98196H8>,
      wd1: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4,
      wd1: 16-sector PIO, LBA, 78167MB, 16383 cyl, 16 head, 63 sec, 160086528 sectors
      

  3. PC, Pentium III, 500MHz, 256MB RAM, FreeBSD 4.4-STABLE,
    ad0: 6187MB <FUJITSU MPC3064AT> [13410/15/63] at ata0-master UDMA33
    

  4. PC, AMD-K7, 500MHz, FreeBSD 4.4-STABLE, 332MB RAM,
    1. ahc0: <Adaptec 2940 Ultra2 SCSI adapter (OEM)>
      da0: <IBM DNES-309170W SA30> Fixed Direct Access SCSI-3 device
      da0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
      da0: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
      
      1. SCSI with softupdates
      2. SCSI without softupdates

    2. ad0: 8063MB <FUJITSU MPD3084AT> [16383/16/63] at ata0-master UDMA66
      
      softupdates

  5. PC, Linux 2.2.12,
    hda: IBM-DJNA-370910, 8693MB w/1966kB Cache, CHS=1108/255/63
    
    ext 2 FS

  6. PC, Linux 2.4.7,
    hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=2434/255/63, UDMA(66)
    reiserfs: using 3.5.x disk format
    ReiserFS version 3.6.25
    

  7. Dec Digitial Alpha, OSF/1, SCSI disk?

  8. Sun SPARC, SunOS 5.6, SCSI disk?

  9. Sun SPARC, SunOS 5.7, SCSI disk?
    1. mount options: no logging, atime
    2. mount options: logging, atime
    3. mount options: logging, noatime

  10. Sun SPARC E450, 4 CPUs,
    1. Baydel RAID

    2. SCSI disk

  11. AIX 4.3.3, using JFS (default).

  12. PC, AMD K7, 1000MHz, 512MB RAM, SuSE 7.3, kernel 2.4.10
    WD1200BB
    hdg: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=232581/16/63, UDMA(100)
    
    1. /home jfs
    2. /opt reiserfs
    3. /work ext3

  13. HP-UX 11.00

  14. PC, Pentium II, 360MHz, 512MB RAM, FreeBSD 4.6,
    ad0: 8693MB <IBM-DJNA-370910> [17662/16/63] at ata0-master UDMA33
    acd0: CDROM <CD-ROM 40X> at ata1-master PIO4
    

  15. Intel IA64, 4 CPUs, 1GB RAM
    scsi0 : ioc0: LSI53C1030, FwRev=01000000h, Ports=1, MaxQ=255, IRQ=52
      Vendor: MAXTOR    Model: ATLASU320_18_SCA  Rev: B120
      Type:   Direct-Access                      ANSI SCSI revision: 03
    Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
    SCSI device sda: 35916548 512-byte hdwr sectors (18389 MB)
    reiserfs: found format "3.6" with standard journal
    reiserfs: using ordered data mode
    Using r5 hash to sort names
    

  16. Intel Pentium III, 650 MHz, 256MB RAM
    da0 at ahc0 bus 0 target 0 lun 0
    da0: <SEAGATE ST39175LW 0001> Fixed Direct Access SCSI-2 device 
    da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled
    da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C)
    

  17. PC, Pentium III, 450MHz, 256MB RAM, FreeBSD 4.8, softupdates
    ad0: 6187MB <FUJITSU MPD3064AT> [13410/15/63] at ata0-master UDMA33
    

  18. PC, FreeBSD 4.10, softupdates
    da3 at ahc0 bus 0 target 4 lun 0
    da3: <IBM DNES-309170Y SA30> Fixed Direct Access SCSI-3 device 
    da3: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
    da3: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
    

  19. PC, VIA C3, 667MHz, 256MB RAM, OpenBSD 3.2,
    1. wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-371350>
      wd0: 16-sector PIO, LBA, 12949MB, 16383 cyl, 16 head, 63 sec, 26520480 sectors
      

    2. wd1 at pciide0 channel 0 drive 1: <WDC WD1200BB-53CAA0>
      wd1: 16-sector PIO, LBA, 114473MB, 16383 cyl, 16 head, 63 sec, 234441648 sectors
      

    3. wd2 at pciide1 channel 0 drive 0: <Maxtor 6Y160P0>
      wd2: 16-sector PIO, LBA48, 156334MB, 16383 cyl, 16 head, 63 sec, 320173056 sectors
      wd2(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 6
      


Meta Data Operations

In this section, some simple test programs are used that create some files, perform (sequential) read/write operations on them and remove them afterwards.

Entries in the following table are elapsed time in seconds (except for the first column which obviously refers to the machine description above). The program that has been used to produce these results is fsperf1.c.

machine 5000 100 -c 5000 100 -c -r 5000 100
1 50 49 48
1 42 48 51
2a 3 7 10
    about 2200 tps about 1500 tps

2b

  11 21
3 10 34 34
  about 500 tps    
4(a)i   126 125
4(a)ii   208 454
4b   43 48
7 7 13 16
5 9 8 9
8 133 201 603
9a   52 665
10a 9 9 12

11

89 139 233

Comments:

(2004-07-14) With and without fsync(2) (-S)

common parameters machine -c -c -r -S -c -S -c -r
(5000 100) 17 42 42 2 3
  10b 165 496 165 495
  18 83 83 5 8
  19a 8 7 1 3
  19b 8 9 1 3
  19c 7 9 1 2
(-s 32 5000 100) 17 109 109 8 9
  10b 250 537 207 498
  18 114 113 14 16
  19b 87 81 3 5
  19c 26 26 4 5

Comments:

Next version: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks.

machine -h 1 -c 1000 1000 -h 1 -c -r 1000 1000
1 18 18
2a 24 24
2b 7 9
3 14 14
4(a)i 23 23
4(a)ii 33 77
4b 25 49
5 3 2
7 3 4
8 58 163
9a 51 139
11 28 48

Comments:


Meta Data Operations: Existing Files

Next version fsperf1.c: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks. The parameters for the following table are 1000 operations and 1000 files, hence each file is used once. Additional parameters are listed in the heading. c: create, h 1: one level hashing, r: rename file twice, p: populate directories before test, then just reuse the files.

machine -h 1 -c -h 1 -c -r -p -h 1 -c -p -h 1 -c -r
1 32 31 18 17
2a 18 18 9 10
2b 10 10 8 10
5 2 1 2 1
6 2 2 4 4
7 2 4 2 3
8 58 165 78 178
9a 27 127 33 131
9c 13 51 37 55
11 28 48 28 48

Comments:


Writing a Logfile

Another test program (fsseq1.c) writes lines to a file and uses fsync(2) after a specified number (-C parameter).

20000 entries (10000 entries each for received/delivered, total 490000 bytes).

machine - -C 100 -C 50 -C 10 -C 5 -C 2 -f
1 1 4 6 17 32 78 150
2a 0 2 2 5 5 9 18
2b 1 0 1 3 4 10 20
3 1 2 3 9 16 37 68
5 1 1 2 6 12 27 56
7 0 4 8 39 79 198 410
8 1 7 13 60 120 299 598
9a 1 8 13 15 62 90 140
11 0 6 12 53 106 262 518

This clearly demonstrates the need for group commits. However, the program requires a lot of CPU since each line is generated by snprintf(). Hence the full I/O speed may not be reached. To confirm this, another program (fsseq2.c) is used that just writes a buffer with a fixed content to a file.

The following table lists the results for group commits (C) together with various buffer sizes (256, 1024, 4096, 8192, and 16384). As usual the entries are execution time in seconds. The program writes 2000 records in total, e.g., for size 16384 that is 31MB data.

machine C 256 1024 4096 8192 16384  
5 1 4 5 10 20 34  
  2 2 4 6 12 22  
  5 1 2 5 7 15  
  10 1 1 3 6 12  
  50 1 0 3 5 10  
  100 0 1 3 5 10  
7 1 1 5 20 40 44  
  2 1 5 11 23 29  
  5 1 5 9 12 13  
  10 1 2 3 6 7  
  50 0 1 1 2 3  
  100 0 1 1 1 3  
8 1 3 10 45 95 109  
  2 2 11 23 52 59  
  5 3 11 19 24 32  
  10 2 5 6 15 21  
  50 1 2 3 8 13  
  100 0 1 3 6 13  
9a 1 3 12 34 35 58  
  2 3 12 18 53 53  
  5 3 6 21 23 24  
  10 3 5 6 13 14  
  50 1 2 2 5 7  
  100 1 1 2 3 6  
11 1 21 35 77 83 92  
  2 13 26 38 45 50  
  5 8 13 17 20 24  
  10 5 6 10 11 15  
  50 1 2 2 4 7  
  100 1 1 2 3 6  

Comments:

Yet another program (fsseq3.c) uses write() instead of fwrite(). This time the tests write 40000KB each, which makes it simpler to determine the throughput.

Note: as usual, these times are not very accurate (1s resolution), and hence the rate is inaccurate too. Machines:

1
C s records time KB/s
1 512 80000 1365 29
1 1024 40000 734 54
1 2048 20000 451 88
1 4096 10000 352 113
1 8192 5000 250 160
2 512 80000 736 54
2 1024 40000 453 88
2 2048 20000 354 112
2 4096 10000 382 104
2 8192 5000 225 177
5 512 80000 638 62
5 1024 40000 585 68
5 2048 20000 312 128
5 4096 10000 187 213
5 8192 5000 101 396
10 512 80000 561 71
10 1024 40000 296 135
10 2048 20000 161 248
10 4096 10000 88 454
10 8192 5000 60 666
50 512 80000 128 312
50 1024 40000 70 571
50 2048 20000 41 975
50 4096 10000 34 1176
50 8192 5000 29 1379
100 512 80000 73 547
100 1024 40000 43 930
100 2048 20000 33 1212
100 4096 10000 28 1428
100 8192 5000 27 1481
2b
C s records time KB/s
1 512 80000 165 242
1 1024 40000 90 444
1 2048 20000 54 740
1 4096 10000 28 1428
1 8192 5000 16 2500
2 512 80000 94 425
2 1024 40000 52 769
2 2048 20000 30 1333
2 4096 10000 17 2352
2 8192 5000 11 3636
5 512 80000 54 740
5 1024 40000 33 1212
5 2048 20000 19 2105
5 4096 10000 11 3636
5 8192 5000 8 5000
10 512 80000 31 1290
10 1024 40000 18 2222
10 2048 20000 11 3636
10 4096 10000 8 5000
10 8192 5000 6 6666
50 512 80000 11 3636
50 1024 40000 8 5000
50 2048 20000 6 6666
50 4096 10000 5 8000
50 8192 5000 4 10000
100 512 80000 10 4000
100 1024 40000 8 5000
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 5 8000

5
C s records time KB/s
1 512 80000 13440 2
1 1024 40000 6790 5
1 2048 20000 3451 11
1 4096 10000 1779 22
1 8192 5000 1007 39
2 512 80000 6790 5
2 1024 40000 3439 11
2 2048 20000 1763 22
2 4096 10000 909 44
2 8192 5000 471 84
5 512 80000 2763 14
5 1024 40000 1414 28
5 2048 20000 739 54
5 4096 10000 383 104
5 8192 5000 208 192
10 512 80000 1414 28
10 1024 40000 731 54
10 2048 20000 384 104
10 4096 10000 208 192
10 8192 5000 120 333
50 512 80000 312 128
50 1024 40000 174 229
50 2048 20000 101 396
50 4096 10000 64 625
50 8192 5000 46 869
100 512 80000 171 233
100 1024 40000 100 400
100 2048 20000 64 625
100 4096 10000 46 869
100 8192 5000 37 1081
6
C s records time KB/s
1 512 80000 130 307
1 1024 40000 93 430
1 2048 20000 78 512
1 4096 10000 23 1739
1 8192 5000 12 3333
2 512 80000 62 645
2 1024 40000 46 869
2 2048 20000 24 1666
2 4096 10000 13 3076
2 8192 5000 15 2666
5 512 80000 66 606
5 1024 40000 31 1290
5 2048 20000 18 2222
5 4096 10000 15 2666
5 8192 5000 10 4000
10 512 80000 28 1428
10 1024 40000 19 2105
10 2048 20000 13 3076
10 4096 10000 10 4000
10 8192 5000 10 4000
50 512 80000 14 2857
50 1024 40000 10 4000
50 2048 20000 10 4000
50 4096 10000 9 4444
50 8192 5000 7 5714
100 512 80000 11 3636
100 1024 40000 10 4000
100 2048 20000 8 5000
100 4096 10000 8 5000
100 8192 5000 8 5000

7
C s records time KB/s
1 512 80000 3347 11
1 1024 40000 1689 23
1 2048 20000 845 47
1 4096 10000 418 95
1 8192 5000 192 208
2 512 80000 1243 32
2 1024 40000 796 50
2 2048 20000 431 92
2 4096 10000 222 180
2 8192 5000 122 327
5 512 80000 655 61
5 1024 40000 268 149
5 2048 20000 161 248
5 4096 10000 108 370
5 8192 5000 58 689
10 512 80000 355 112
10 1024 40000 185 216
10 2048 20000 85 470
10 4096 10000 42 952
10 8192 5000 38 1052
50 512 80000 88 454
50 1024 40000 49 816
50 2048 20000 31 1290
50 4096 10000 18 2222
50 8192 5000 10 4000
100 512 80000 45 888
100 1024 40000 33 1212
100 2048 20000 19 2105
100 4096 10000 14 2857
100 8192 5000 14 2857
8
C s records time KB/s
1 512 80000 6302 6
1 1024 40000 3220 12
1 2048 20000 1695 23
1 4096 10000 949 42
1 8192 5000 552 72
2 512 80000 3183 12
2 1024 40000 1708 23
2 2048 20000 950 42
2 4096 10000 484 82
2 8192 5000 299 133
5 512 80000 1402 28
5 1024 40000 805 49
5 2048 20000 440 90
5 4096 10000 252 158
5 8192 5000 137 291
10 512 80000 783 51
10 1024 40000 395 101
10 2048 20000 211 189
10 4096 10000 122 327
10 8192 5000 87 459
50 512 80000 181 220
50 1024 40000 107 373
50 2048 20000 68 588
50 4096 10000 49 816
50 8192 5000 42 952
100 512 80000 111 360
100 1024 40000 70 571
100 2048 20000 50 800
100 4096 10000 40 1000
100 8192 5000 36 1111

9a
C s records time KB/s
1 512 80000 2638 15
1 1024 40000 1419 28
1 2048 20000 753 53
1 4096 10000 442 90
1 8192 5000 221 180
2 512 80000 1379 29
2 1024 40000 774 51
2 2048 20000 409 97
2 4096 10000 220 181
2 8192 5000 124 322
5 512 80000 644 62
5 1024 40000 382 104
5 2048 20000 198 202
5 4096 10000 105 380
5 8192 5000 58 689
10 512 80000 355 112
10 1024 40000 196 204
10 2048 20000 104 384
10 4096 10000 59 677
10 8192 5000 32 1250
50 512 80000 90 444
50 1024 40000 51 784
50 2048 20000 28 1428
50 4096 10000 19 2105
50 8192 5000 15 2666
100 512 80000 54 740
100 1024 40000 28 1428
100 2048 20000 20 2000
100 4096 10000 15 2666
100 8192 5000 14 2857
9b
C s records time KB/s
1 512 80000 2642 15
1 1024 40000 1312 30
1 2048 20000 723 55
1 4096 10000 376 106
1 8192 5000 185 216
2 512 80000 1363 29
2 1024 40000 699 57
2 2048 20000 359 111
2 4096 10000 185 216
2 8192 5000 104 384
5 512 80000 563 71
5 1024 40000 302 132
5 2048 20000 162 246
5 4096 10000 88 454
5 8192 5000 46 869
10 512 80000 299 133
10 1024 40000 161 248
10 2048 20000 87 459
10 4096 10000 46 869
10 8192 5000 24 1666
50 512 80000 81 493
50 1024 40000 44 909
50 2048 20000 35 1142
50 4096 10000 19 2105
50 8192 5000 13 3076
100 512 80000 51 784
100 1024 40000 35 1142
100 2048 20000 26 1538
100 4096 10000 15 2666
100 8192 5000 13 3076

9c
C s records time KB/s
1 512 80000 2576 15
1 1024 40000 1326 30
1 2048 20000 707 56
1 4096 10000 377 106
1 8192 5000 192 208
2 512 80000 1324 30
2 1024 40000 685 58
2 2048 20000 349 114
2 4096 10000 187 213
2 8192 5000 107 373
5 512 80000 578 69
5 1024 40000 313 127
5 2048 20000 163 245
5 4096 10000 89 449
5 8192 5000 46 869
10 512 80000 306 130
10 1024 40000 162 246
10 2048 20000 86 465
10 4096 10000 46 869
10 8192 5000 25 1600
50 512 80000 82 487
50 1024 40000 44 909
50 2048 20000 33 1212
50 4096 10000 19 2105
50 8192 5000 13 3076
100 512 80000 52 769
100 1024 40000 36 1111
100 2048 20000 25 1600
100 4096 10000 16 2500
100 8192 5000 13 3076
12a
C s records time KB/s
1 512 80000 65 615
1 1024 40000 61 655
1 2048 20000 59 677
1 4096 10000 5 8000
1 8192 5000 4 10000
2 512 80000 13 3076
2 1024 40000 8 5000
2 2048 20000 4 10000
2 4096 10000 4 10000
2 8192 5000 3 13333
5 512 80000 44 909
5 1024 40000 21 1904
5 2048 20000 13 3076
5 4096 10000 3 13333
5 8192 5000 3 13333
10 512 80000 12 3333
10 1024 40000 3 13333
10 2048 20000 3 13333
10 4096 10000 3 13333
10 8192 5000 5 8000
50 512 80000 11 3636
50 1024 40000 3 13333
50 2048 20000 5 8000
50 4096 10000 5 8000
50 8192 5000 4 10000
100 512 80000 5 8000
100 1024 40000 5 8000
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 3 13333

12b
C s records time KB/s
1 512 80000 124 322
1 1024 40000 87 459
1 2048 20000 72 555
1 4096 10000 20 2000
1 8192 5000 10 4000
2 512 80000 47 851
2 1024 40000 32 1250
2 2048 20000 16 2500
2 4096 10000 8 5000
2 8192 5000 5 8000
5 512 80000 56 714
5 1024 40000 27 1481
5 2048 20000 20 2000
5 4096 10000 5 8000
5 8192 5000 5 8000
10 512 80000 23 1739
10 1024 40000 17 2352
10 2048 20000 6 6666
10 4096 10000 3 13333
10 8192 5000 6 6666
50 512 80000 7 5714
50 1024 40000 4 10000
50 2048 20000 6 6666
50 4096 10000 6 6666
50 8192 5000 4 10000
100 512 80000 7 5714
100 1024 40000 6 6666
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 3 13333
12c
C s records time KB/s
1 512 80000 205 195
1 1024 40000 144 277
1 2048 20000 122 327
1 4096 10000 14 2857
1 8192 5000 7 5714
2 512 80000 34 1176
2 1024 40000 22 1818
2 2048 20000 13 3076
2 4096 10000 7 5714
2 8192 5000 5 8000
5 512 80000 96 416
5 1024 40000 48 833
5 2048 20000 20 2000
5 4096 10000 4 10000
5 8192 5000 4 10000
10 512 80000 36 1111
10 1024 40000 7 5714
10 2048 20000 5 8000
10 4096 10000 4 10000
10 8192 5000 3 13333
50 512 80000 12 3333
50 1024 40000 4 10000
50 2048 20000 4 10000
50 4096 10000 3 13333
50 8192 5000 3 13333
100 512 80000 7 5714
100 1024 40000 6 6666
100 2048 20000 3 13333
100 4096 10000 3 13333
100 8192 5000 3 13333


Raw Throughput

Very simple measurement of transfer rate:

time dd ibs=8192 if=/dev/zero obs=8192 count=5120 of=incq

machine s MB/s
1 11.6 3.6
2a 4.8 8.4
2b 1.9 20.9
5 10.83 3.9
6 0.65 61
7 1.0 40.0
8 14.8 2.8
9 6.3 6.6
11 6.98 6.0
12a 0.247 161
12b 0.401 99
12c 0.357 112

Comments:

dd ibs=8192 if=/dev/zero obs=8192 count=124000 of=incq

machine s MB/s
12a 24.762 39
12b 22.608 42

The data in this table is more likely, even though 40MB/s is still very fast.


Writing a Logfile; 2nd Version

For comparison with the Berkeley DB performance data, more tests have been run with fsseq4 with different parameters. Number of records is 100000 unless otherwise noted, t/s is transactions (records written) per second. Notice: fsseq3 writes twice as much records as fsseq4 (one add and one delete entry each), and it calls fsync() twice as often (after the add and after the delete entry).

1
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 2 976 50000
1000 20 7 279 14285
100 20 20 97 5000
100000 100 3 3255 33333
10000 100 4 2441 25000
1000 100 8 1220 12500
100 100 57 171 1754
100000 512 15 3333 6666
10000 512 16 3125 6250
1000 512 17 2941 5882
100 512 67 746 1492
100000 1024 29 3448 3448
10000 1024 30 3333 3333
1000 1024 33 3030 3030
100 1024 77 1298 1298
100000 2048 60 3333 1666
10000 2048 60 3333 1666
1000 2048 64 3125 1562
100 2048 101 1980 990
2b
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 1 1953 100000
1000 20 2 976 50000
100 20 2 976 50000
100000 100 2 4882 50000
10000 100 1 9765 100000
1000 100 2 4882 50000
100 100 7 1395 14285
100000 512 3 16666 33333
10000 512 3 16666 33333
1000 512 4 12500 25000
100 512 6 8333 16666
100000 1024 6 16666 16666
10000 1024 5 20000 20000
1000 1024 6 16666 16666
100 1024 8 12500 12500
100000 2048 12 16666 8333
10000 2048 12 16666 8333
1000 2048 15 13333 6666
100 2048 15 13333 6666

5
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 1 1953 100000
1000 20 2 976 50000
100 20 9 217 11111
100000 100 3 3255 33333
10000 100 4 2441 25000
1000 100 5 1953 20000
100 100 15 651 6666
100000 512 16 3125 6250
10000 512 18 2777 5555
1000 512 22 2272 4545
100 512 75 666 1333
100000 1024 34 2941 2941
10000 1024 35 2857 2857
1000 1024 46 2173 2173
100 1024 139 719 719
100000 2048 67 2985 1492
10000 2048 79 2531 1265
1000 2048 95 2105 1052
100 2048 246 813 406
7
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 1 1953 100000
1000 20 4 488 25000
100 20 31 63 3225
100000 100 2 4882 50000
10000 100 2 4882 50000
1000 100 6 1627 16666
100 100 33 295 3030
100000 512 8 6250 12500
10000 512 11 4545 9090
1000 512 15 3333 6666
100 512 50 1000 2000
100000 1024 11 9090 9090
10000 1024 10 10000 10000
1000 1024 14 7142 7142
100 1024 42 2380 2380
100000 2048 25 8000 4000
10000 2048 26 7692 3846
1000 2048 21 9523 4761
100 2048 42 4761 2380

8
C s time KB/s t/s
100000 20 3 651 33333
10000 20 3 651 33333
1000 20 3 651 33333
100 20 5 390 20000
100000 100 3 3255 33333
10000 100 4 2441 25000
1000 100 4 2441 25000
100 100 9 1085 11111
100000 512 5 10000 20000
10000 512 5 10000 20000
1000 512 7 7142 14285
100 512 20 2500 5000
100000 1024 8 12500 12500
10000 1024 8 12500 12500
1000 1024 9 11111 11111
100 1024 26 3846 3846
100000 2048 15 13333 6666
10000 2048 16 12500 6250
1000 2048 21 9523 4761
100 2048 36 5555 2777
11
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 1 1953 100000
1000 20 4 488 25000
100 20 29 67 3448
100000 100 1 9765 100000
10000 100 2 4882 50000
1000 100 5 1953 20000
100 100 36 271 2777
100000 512 4 12500 25000
10000 512 5 10000 20000
1000 512 9 5555 11111
100 512 44 1136 2272
100000 1024 8 12500 12500
10000 1024 9 11111 11111
1000 1024 13 7692 7692
100 1024 54 1851 1851
100000 2048 15 13333 6666
10000 2048 17 11764 5882
1000 2048 22 9090 4545
100 2048 67 2985 1492

10a
C s time KB/s t/s
100000 20 2 976 50000
10000 20 1 1953 100000
1000 20 2 976 50000
100 20 3 651 33333
100000 100 2 4882 50000
10000 100 2 4882 50000
1000 100 2 4882 50000
100 100 6 1627 16666
100000 512 3 16666 33333
10000 512 3 16666 33333
1000 512 4 12500 25000
100 512 21 2380 4761
100000 1024 3 33333 33333
10000 1024 4 25000 25000
1000 1024 7 14285 14285
100 1024 41 2439 2439
100000 2048 4 50000 25000
10000 2048 5 40000 20000
1000 2048 12 16666 8333
100 2048 80 2500 1250
10b
C s time KB/s t/s
100000 20 1 1953 100000
10000 20 1 1953 100000
1000 20 4 488 25000
100 20 23 84 4347
100000 100 2 4882 50000
10000 100 2 4882 50000
1000 100 5 1953 20000
100 100 32 305 3125
100000 512 5 10000 20000
10000 512 5 10000 20000
1000 512 9 5555 11111
100 512 42 1190 2380
100000 1024 10 10000 10000
10000 1024 11 9090 9090
1000 1024 14 7142 7142
100 1024 59 1694 1694
100000 2048 21 9523 4761
10000 2048 21 9523 4761
1000 2048 25 8000 4000
100 2048 78 2564 1282

Comments:


Harddisk Performance

Some performance data gathered from the WWW.

SR Office DriveMark 2002 in IO/Sec taken from [Ra01]:

Manufacturer Model I/O operations/second
Seagate Cheetah X15-36LP (36.7 GB Ultra160/m SCSI) 485
Maxtor Atlas 10k III (73 GB Ultra160/m SCSI) 455
Fujitsu MAM3367 (36 GB Ultra160/m SCSI) 446
IBM Ultrastar 36Z15 (36.7 GB Ultra160/m SCSI) 402
Western Digital Caviar WD1000BB-SE (100 GB ATA-100) 397
Seagate Cheetah 36ES (36 GB Ultra160/m SCSI) 373
Fujitsu MAN3735 (73 GB Ultra160/m SCSI) 369
Seagate Cheetah 73LP (73.4 GB Ultra160/m SCSI) 364
Western Digital Caviar WD1200BB (120 GB ATA-100) 337
Seagate Cheetah 36XL (36.7 GB Ultra 160/m SCSI) 328
IBM Deskstar 60GXP (60.0 GB ATA-100) 303
Maxtor DiamondMax Plus D740X (80 GB ATA-133) 301
Seagate Barracuda ATA IV (80 GB ATA-100) 296
Quantum Fireball Plus AS (60.0 GB ATA-100) 295
Quantum Atlas V (36.7 GB Ultra160/m SCSI) 269
Seagate Barracuda 180 (180 GB Ultra160/m SCSI) 249
Maxtor DiamondMax 536DX (100 GB ATA-100) 248
Seagate Barracuda 36ES (36 GB Ultra160/m SCSI) 222
Seagate U6 (80 GB ATA-100) 210
Samsung SpinPoint P20 (40.0 GB ATA-100) 192

ZD Business Disk WinMark 99 in MB/Sec

Manufacturer Model MB/second
Seagate Cheetah X15-36LP (36.7 GB Ultra160/m SCSI) 13.1
Maxtor Atlas 10k III (73 GB Ultra160/m SCSI) 12.0
IBM Ultrastar 36Z15 (36.7 GB Ultra160/m SCSI) 11.3
Fujitsu MAM3367 (36 GB Ultra160/m SCSI) 11.1
Seagate Cheetah 36ES (36 GB Ultra160/m SCSI) 10.5
Seagate Cheetah 73LP (73.4 GB Ultra160/m SCSI) 10.2
Seagate Cheetah 36XL (36.7 GB Ultra 160/m SCSI) 9.9
Western Digital Caviar WD1000BB-SE (100 GB ATA-100) 9.8
Fujitsu MAN3735 (73 GB Ultra160/m SCSI) 9.1
Western Digital Caviar WD1200BB (120 GB ATA-100) 8.9
IBM Deskstar 60GXP (60.0 GB ATA-100) 8.8
Seagate Barracuda ATA IV (80 GB ATA-100) 8.5
Maxtor DiamondMax Plus D740X (80 GB ATA-133) 8.0
Quantum Atlas V (36.7 GB Ultra160/m SCSI) 7.9
Quantum Fireball Plus AS (60.0 GB ATA-100) 7.7
Seagate Barracuda 36ES (36 GB Ultra160/m SCSI) 7.4
Seagate Barracuda 180 (180 GB Ultra160/m SCSI) 7.1
Maxtor DiamondMax 536DX (100 GB ATA-100) 6.9
Samsung SpinPoint P20 (40.0 GB ATA-100) 6.5
Seagate U6 (80 GB ATA-100) 6.3

The file and web server benchmarks (also available at [Ra01]) are not useful since they include 80 and 100 per cent read accesses, which is not really typical of MTA servers.


Performance of Berkeley DB

Some preliminary, very simple performance tests with Berkeley DB 4.0.14 have been made. Two benchmark programs have been used: bench_001 and bench_002 which use Btree and Queue as access methods. They are based on examples_c/bench_001.c that comes with Berkeley DB. Notice: the access method Queue requires fixed size records and the access methods is record numbers (simply increasing). This method may be used for the backup of the incoming EDB. Notice: the tests have not (yet) been run multiple times, at least not systematically. Testing showed that the runtimes may vary noticable. However, the data can be used to show some trends.

Possible parameters are:

-n N number of records to write
-T N use transactions, synchronize after N transactions
-l N length of data part
-C N do a checkpoint every N actions and possibly remove logfile

Unless otherwise noted, the following tests have been performed on system 1, see Section 5.2.1. Number of records is 100000 unless otherwise noted, t/s is transactions (records written) per second.

Vary synchronization (-T):

Prg -T -l real user sys KB/s t/s
1 100000 20 14.73 5.99 1.00 132 6788
1 10000 20 14.64 5.85 1.29 133 6830
1 1000 20 18.14 6.02 1.10 107 5512
1 100 20 70.57 6.03 1.76 27 1417
2 100000 20 11.58 2.91 0.74 168 8635
2 10000 20 10.14 2.86 0.85 192 9861
2 1000 20 11.20 2.85 0.95 174 8928
2 100 20 68.71 2.73 1.61 28 1455

Vary data length, first program only:

Prg -T -l real user sys KB/s t/s
1 100000 20 14.39 5.93 1.16 135 6949
1 10000 20 16.77 5.91 1.16 116 5963
1 1000 20 16.58 5.91 1.13 117 6031
1 100 20 68.10 5.95 1.85 28 1468
1 100000 100 23.30 5.57 1.90 419 4291
1 10000 100 30.56 5.56 1.90 319 3272
1 1000 100 33.39 5.51 1.99 292 2994
1 100 100 82.58 5.47 2.62 118 1210
1 100000 512 96.03 7.69 4.78 520 1041
1 10000 512 94.12 7.39 5.03 531 1062
1 1000 512 97.67 7.20 5.15 511 1023
1 100 512 164.13 7.51 5.67 304 609
1 100000 1024 304.88 10.88 10.62 327 327
1 10000 1024 270.00 10.69 10.66 370 370
1 1000 1024 275.27 10.91 11.06 363 363
1 100 1024 346.10 11.01 12.09 288 288
1 100000 2048 788.88 22.18 27.59 253 126

The test has been aborted at this point. Maybe run it again later on.

Vary data length, second program only:

Prg -T -l real user sys KB/s t/s
2 100000 20 9.46 2.81 0.80 206 10570
2 10000 20 11.53 2.88 0.81 169 8673
2 1000 20 12.47 2.83 0.96 156 8019
2 100 20 67.91 2.80 1.59 28 1472
2 100000 100 13.57 2.92 1.20 719 7369
2 10000 100 18.62 3.07 1.17 524 5370
2 1000 100 19.04 2.92 1.20 512 5252
2 100 100 72.73 2.80 2.16 134 1374
2 100000 512 46.10 3.90 2.61 1084 2169
2 10000 512 53.55 3.84 2.79 933 1867
2 1000 512 66.71 3.65 3.05 749 1499
2 100 512 105.25 3.36 3.76 475 950
2 100000 1024 103.72 4.92 4.68 964 964
2 10000 1024 105.53 4.87 4.82 947 947
2 1000 1024 105.60 4.73 4.85 946 946
2 100 1024 145.14 4.73 5.84 688 688
2 100000 2048 194.70 7.44 8.09 1027 513
2 10000 2048 197.09 7.22 8.15 1014 507
2 1000 2048 200.09 7.10 8.70 999 499
2 100 2048 234.85 6.86 9.53 851 425

Put the directory for logfiles on a different disk (/extra/home/ca/tmp/db), using Btree.

Prg -T -l real user sys KB/s t/s
1 100000 20 14.90 6.05 0.96 131 6711
1 10000 20 14.46 5.95 1.12 135 6915
1 1000 20 17.70 5.83 1.08 110 5649
1 100 20 63.91 5.92 1.74 30 1564
1 100000 100 27.00 5.53 1.90 361 3703
1 10000 100 33.39 5.63 1.92 292 2994
1 1000 100 29.16 5.63 1.75 334 3429
1 100 100 72.18 5.44 2.42 135 1385
1 100000 512 96.94 7.49 5.09 515 1031
1 10000 512 107.99 7.34 5.17 463 926
1 1000 512 97.05 7.21 5.54 515 1030
1 100 512 145.15 7.85 5.36 344 688
1 100000 1024 268.88 10.67 11.54 371 371
1 10000 1024 279.65 11.02 11.05 357 357
1 1000 1024 304.07 10.58 11.69 328 328
1 100 1024 319.74 10.88 12.10 312 312
1 100000 2048 738.38 23.07 27.13 270 135
1 10000 2048 651.86 22.70 26.92 306 153
1 1000 2048 693.13 21.79 28.63 288 144
1 100 2048 724.68 22.51 29.04 275 137

Put the directory for logfiles on a different disk (/extra/home/ca/tmp/db), using Queue.

Prg -T -l real user sys KB/s t/s  
2 100000 20 10.92 2.90 0.65 178 9157  
2 10000 20 9.94 2.87 0.77 196 10060  
2 1000 20 31.66 2.85 0.88 61 3158  
2 100 20 60.74 2.93 1.36 32 1646  
2 100000 100 13.62 3.09 0.95 717 7342  
2 10000 100 19.30 3.02 1.17 505 5181  
2 1000 100 15.55 3.16 1.08 628 6430  
2 100 100 71.88 2.97 1.72 135 1391  
2 100000 512 52.08 3.93 2.50 960 1920  
2 10000 512 52.42 3.68 3.03 953 1907  
2 1000 512 56.58 3.91 2.90 883 1767  
2 100 512 95.38 3.74 3.64 524 1048  
2 100000 1024 107.20 4.69 4.87 932 932  
2 10000 1024 100.15 4.88 4.57 998 998  
2 1000 1024 100.95 4.78 5.06 990 990  
2 100 1024 139.38 4.71 5.61 717 717  
2 100000 2048 187.78 7.68 8.41 1065 532  
2 10000 2048 189.76 7.09 8.62 1053 526  
2 1000 2048 201.95 7.37 8.65 990 495  
2 100 2048 217.66 7.21 9.53 918 459  

Machine 2b: Vary data length, first program:

Prg -T -l real user sys KB/s t/s
1 100000 20 21.56 9.04 1.88 90 4638
1 10000 20 13.02 9.58 1.92 150 7680
1 1000 20 12.64 9.40 1.81 154 7911
1 100 20 16.35 9.68 1.73 119 6116
1 100000 100 32.79 9.16 4.60 297 3049
1 10000 100 25.05 9.54 4.11 389 3992
1 1000 100 23.69 9.80 4.39 412 4221
1 100 100 28.51 10.25 3.89 342 3507
1 100000 512 47.67 13.82 13.65 1048 2097
1 10000 512 48.04 13.22 13.64 1040 2081
1 1000 512 46.35 13.16 14.54 1078 2157
1 100 512 52.10 13.78 11.93 959 1919
1 100000 1024 109.32 21.59 25.00 914 914
1 10000 1024 107.94 19.97 26.49 926 926
1 1000 1024 108.74 20.13 26.06 919 919
1 100 1024 113.14 20.01 26.45 883 883
1 100000 2048 240.16 44.55 55.72 832 416
1 10000 2048 262.05 43.58 54.94 763 381
1 1000 2048 245.93 41.17 57.54 813 406
1 100 2048 254.97 41.39 59.63 784 392

Vary data length, second program:

Prg -T -l real user sys KB/s t/s
2 100000 20 9.85 5.92 1.30 198 10152
2 10000 20 7.82 5.90 1.28 249 12787
2 1000 20 7.21 5.13 1.34 270 13869
2 100 20 10.36 5.79 1.23 188 9652
2 100000 100 10.22 5.84 2.73 955 9784
2 10000 100 10.54 6.11 2.72 926 9487
2 1000 100 10.68 6.12 2.40 914 9363
2 100 100 13.57 6.06 2.37 719 7369
2 100000 512 23.73 7.32 8.89 2107 4214
2 10000 512 25.36 7.42 8.44 1971 3943
2 1000 512 26.12 7.19 8.56 1914 3828
2 100 512 33.79 7.24 8.78 1479 2959
2 100000 1024 47.93 9.05 12.29 2086 2086
2 10000 1024 52.26 9.63 14.91 1913 1913
2 1000 1024 52.07 9.37 14.50 1920 1920
2 100 1024 58.91 9.49 14.52 1697 1697
2 100000 2048 74.59 15.42 20.55 2681 1340
2 10000 2048 72.47 14.99 21.50 2759 1379
2 1000 2048 78.38 14.54 21.93 2551 1275
2 100 2048 76.63 14.01 22.12 2609 1304

Machine 7: Vary data length, second program only; the times for a second test run are added on the right, these clearly show how wildly the results can vary.

Prg -T -l real user sys KB/s t/s
2 100000 20 5.20 2.00 0.30 375 19230
2 10000 20 6.20 2.00 0.30 315 16129
2 1000 20 7.10 2.00 0.30 275 14084
2 100 20 25.80 2.00 0.50 75 3875
2 100000 100 6.30 2.10 0.60 1550 15873
2 10000 100 6.50 2.10 0.60 1502 15384
2 1000 100 10.60 2.20 0.60 921 9433
2 100 100 36.50 2.10 0.80 267 2739
2 100000 512 33.40 2.70 2.80 1497 2994
2 10000 512 29.80 2.70 2.90 1677 3355
2 1000 512 29.30 2.60 2.50 1706 3412
2 100 512 65.90 2.60 2.90 758 1517
2 100000 1024 50.50 3.30 4.90 1980 1980
2 10000 1024 60.80 3.40 5.40 1644 1644
2 1000 1024 51.70 3.30 4.60 1934 1934
2 100 1024 89.70 3.20 5.60 1114 1114
2 100000 2048 90.20 4.40 8.90 2217 1108
2 10000 2048 92.80 4.30 9.10 2155 1077
2 1000 2048 93.50 4.60 7.80 2139 1069
2 100 2048 134.00 4.40 7.50 1492 746
real user sys        
5.0 2.0 0.3        
4.8 1.9 0.3        
7.8 2.0 0.3        
28.5 2.0 0.5        
6.0 2.0 0.6        
7.6 2.1 0.6        
11.5 2.0 0.6        
31.4 2.1 0.8        
18.5 2.6 2.0        
24.6 2.5 2.3        
32.9 2.5 2.8        
61.0 2.5 2.8        
58.4 3.2 5.5        
47.2 3.2 4.6        
45.1 3.2 4.2        
82.0 3.2 4.9        
86.9 4.3 9.1        
67.1 4.3 7.3        
66.0 4.2 7.0        
107.7 4.2 6.5        

Vary data length, first program only:

Prg -T -l real user sys KB/s t/s
1 100000 20 6.90 3.10 0.40 283 14492
1 10000 20 7.20 3.30 0.50 271 13888
1 1000 20 9.90 3.30 0.50 197 10101
1 100 20 28.90 3.20 0.60 67 3460
1 100000 100 11.30 3.40 1.00 864 8849
1 10000 100 12.20 3.30 1.00 800 8196
1 1000 100 14.00 3.30 1.10 697 7142
1 100 100 35.80 3.30 1.30 272 2793
1 100000 512 37.10 4.50 4.20 1347 2695
1 10000 512 50.00 4.60 4.50 1000 2000
1 1000 512 62.50 4.50 4.60 800 1600
1 100 512 68.60 4.50 4.60 728 1457
1 100000 1024 86.20 6.20 8.70 1160 1160
1 10000 1024 117.10 6.00 8.40 853 853
1 1000 1024 78.90 6.10 7.80 1267 1267
1 100 1024 109.60 6.10 7.40 912 912
1 100000 2048 225.80 10.90 15.90 885 442
1 10000 2048 259.40 10.80 16.30 771 385
1 1000 2048 382.60 10.90 17.40 522 261
1 100 2048 394.30 10.90 17.20 507 253

Machine 10a:

Prg -T -l real user sys KB/s t/s
1 100000 20 5.00 4.40 0.50 390 20000
1 10000 20 5.00 4.30 0.60 390 20000
1 1000 20 5.50 4.40 0.80 355 18181
1 100 20 9.00 4.50 3.90 217 11111
1 100000 100 6.10 4.70 1.20 1600 16393
1 10000 100 6.20 4.80 1.20 1575 16129
1 1000 100 6.70 4.60 1.80 1457 14925
1 100 100 10.90 5.00 4.30 895 9174
1 100000 512 13.30 6.50 5.10 3759 7518
1 10000 512 12.90 6.90 4.80 3875 7751
1 1000 512 14.00 7.00 5.00 3571 7142
1 100 512 19.00 7.10 8.40 2631 5263
1 100000 1024 19.70 8.80 8.40 5076 5076
1 10000 1024 19.30 9.20 8.20 5181 5181
1 1000 1024 19.90 9.20 8.70 5025 5025
1 100 1024 26.70 9.20 12.30 3745 3745
1 100000 2048 32.90 13.80 11.70 6079 3039
1 10000 2048 31.10 13.80 12.10 6430 3215
1 1000 2048 34.90 14.40 12.30 5730 2865
1 100 2048 41.30 14.10 16.10 4842 2421

Prg -T -l real user sys KB/s t/s
2 100000 20 4.70 4.20 0.30 415 21276
2 10000 20 4.70 4.00 0.50 415 21276
2 1000 20 5.20 4.20 0.70 375 19230
2 100 20 8.80 4.10 3.90 221 11363
2 100000 100 5.50 4.30 0.80 1775 18181
2 10000 100 5.70 4.30 0.80 1713 17543
2 1000 100 6.20 4.50 1.00 1575 16129
2 100 100 9.70 4.50 4.20 1006 10309
2 100000 512 12.50 5.50 2.30 4000 8000
2 10000 512 13.60 5.40 2.60 3676 7352
2 1000 512 11.70 5.10 3.30 4273 8547
2 100 512 14.50 5.70 6.40 3448 6896
2 100000 1024 17.90 6.80 3.90 5586 5586
2 10000 1024 17.30 6.70 4.60 5780 5780
2 1000 1024 18.40 6.60 4.60 5434 5434
2 100 1024 19.00 7.00 8.10 5263 5263
2 100000 2048 24.80 8.80 6.90 8064 4032
2 10000 2048 21.20 9.00 6.80 9433 4716
2 1000 2048 20.90 9.10 7.20 9569 4784
2 100 2048 24.00 8.90 11.30 8333 4166

General notice: the benchmark programs have been run while the machines were ``in use'', so some unusual results can be explained by the activity of other processes.

Comments:


Miscellaneous about Performance

2004-03-26 Effect of logging using logging to files via sm I/O.

On FreeBSD 4.9, UFS, softupdates, SCSI, smX.0.0.12, relay 5000 messages, 100 threads:

logging time
same disk, smioerr 137-141
same disk, smioout 104
RAM, smioerr 104

This means there is a performance hit of about 35 per cent if smioerr is used instead of smioout. The former uses line buffering, hence there are more writes involved.


Performance of Various Programs


TCP/IP Performance

The program checks/t-net-0.c can be used for very simple performance test of local TCP/IP (AF_INET or AF_LOCAL) communication. This function uses the SM I/O layer on top of sockets. Some of the options which are available are listed below:

-b n set buffer size to n, default 8192
-c n act as client, write n bytes
-s n act as server, read n bytes
-R n read and write n times
-u use a Unix domain socket

The numbers reference the machines listed in Section 5.2.1.1.

1:
-R -c time INET time LOCAL
100000 32 15 10
100000 64 19 12
100000 128 24 17
100000 256 31 25
100000 512 49 43
100000 1024 84 81

7:
-R -c time INET time LOCAL
100000 32 10 7
100000 64 11 7
100000 128 13 9
100000 256 17 14
100000 512 23 20
100000 1024 37 35

8:
-R -c time INET time LOCAL
100000 32 51 41
100000 64 57 46
100000 128 66 60
100000 256 97 86
100000 512 148 138
100000 1024 250 243

9:
-R -c time INET time LOCAL
100000 32 67 52
100000 64 74 59
100000 128 85 71
100000 256 114 97
100000 512 159 148
100000 1024 263 246

11:
-R -c time INET time LOCAL
100000 32 99 89
100000 64 108 94
100000 128 138 115
100000 256 141 151
100000 512 199 221
100000 1024 346 373

Notice: these times vary wildly since the machine is used by several people.

13:
-R -c time INET time LOCAL
100000 32 46 38
100000 64 59 48
100000 128 78 70
100000 256 120 110
100000 512 203 192
100000 1024 376 358


DB Lookup Performance

Just a preliminary number: On system 1 the example program examples_c/bench_001.c achieves about 1 to 1.5 millions lookups per second (this is for a data length of 20 bytes and a cache size of 64MB, which more or less means direct memory access, no disk I/O). Taking the results from 5.4.1 into account means that this is factor of 100 faster than performing lookups over a generic TCP/IP connection. This certainly must be taken into account for the decision how and where to incorporate DB lookups.

Using larger data sizes (256 to 512 bytes) and smaller caches (10000 bytes) cause a significant drop in performance: a sequential lookup of all data varies from 60000 to 10000 lookups per second (on system 1).

Random access goes down as much as 1000 to 2000 lookups per second.


snprintf Performance

On AIX, sm_snprintf() is about 2 times slower than snprintf(). On SunOS 5.8 it's about 1.3, on FreeBSD there is no difference (which isn't surprising since it's almost the same code). It might make sense to use the native snprintf() version on some platforms, however, this isn't possible anymore due to the extensions in sm_snprintf() (which supports more format specifiers, e.g., for constant strings).


next up previous contents
Next: Bibliography Up: Sendmail X Previous: Sendmail X: Implementation   Contents
Claus Assmann