next up previous contents
Next: Bibliography Up: Sendmail X Previous: Sendmail X: Implementation   Contents

Subsections

Sendmail X: Performance Tests and Results


SMTP Server Daemon

Remark (placed here so it doesn't get lost): there is a restricted number ($<$ 60000) of possible open connections to one port. Could that limit the throughput we are trying to achieve or is such a high number of connections unfeasible?


SMTP Sink

For simple performance comparisons several SMTP sinks have been implemented or tested.

Test programs are:

  1. smtp-sink from postfix. This is an entirely event driven program.
  2. thrperconn: one thread per connection.
  3. thrpool: uses a worker model with concurrency limiting, see Section 3.20.4.1.
  4. smtps: state-threads, see Section 3.20.3.1.

Test machines are:

  1. v-sun: Sun SPARCserver E450, 4 processors
  2. v-bsd: FreeBSD 3.4, 2 PIII processors, 2 GB RAM
  3. v-aix: AIX, 4 processors
  4. schmidt: Linux 2.4, uses 15 threads per client only, otherwise the machine just ``dies''.

Entries in the tables down below denote execution time in seconds unless otherwise noted, hence smaller values are better.

Tests have been performed with myslam (a multi-threaded SMTP client), using 7 to 8 client machines, 50 threads per client, and 5000 messages per client.

  1. v-sun (8 clients):

    parameters smtp-sink smtps thrperconn thrpool
    1KB/msg (40MB) 45s 70s 92s 43s
    4KB/msg (160MB) 49s 56s 259s 78s
    32KB/msg (1280MB) 203s 208s $>$999s 110s
    -w 1 141s 109s 156s 230s

    Note: v-sun is a four processor machine, hence the multi-threaded programs (thrpool, thrperconn) can use multiple processors. I didn't select (via an option) multiple processors for smtps though.

    Just as one example, the achieved throughput in MB/s is listed in the next table. As it can be seen, it is an order of magnitude lower than the sustainable throughput that can be achieved over a single connection (about 85-90MB/s measured with ttcp; this is a 100Mbit/s ethernet).

    parameters smtp-sink smtps thrperconn thrpool
    1KB/msg (40MB) 0.9 0.6 0.4 0.9
    4KB/msg (160MB) 3.3 2.9 0.6 2.1
    32KB/msg (1280MB) 6.5 6.3 - 11.9

  2. v-bsd:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 97 87 380 140
    4KB msg size 108 130 1150 156
    32KB msg size 208 197 fails 330
    -w 1 165 138 484 223

  3. v-aix:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 38 28 - 31
    4KB msg size 34 33 - 31
    32KB msg size 125 125 - 125
    -w 1 125 125 - 155
            125 for 250/3

  4. schmidt:

    parameters smtp-sink smtps thrperconn thrpool
    1KB msg size 45 44 165 74
    4KB msg size 54 45 418 75
    32KB msg size 217 167 fails 256
    -w 1 370 360 - 337


SMTP Sink with CDB

2004-03-02

statethreads/examples/smtps3

wiz

See Section 5.2.1.1, machine 1

wiz$ time ./smtpc2 -fa@b.c -Rx@y.z -t 100 -s 1000 -r localhost

sink program FS times (s)
smtps3 - 5
smtpss UFS 17, 18
smtps3 -C UFS 16, 17, 19

perf-lab

source: s-6.perf-lab

sink: v-bsd.perf-lab

with -C

s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab
   19.17s real     1.08s user     0.64s system

without -C

s-6.perf-lab$ time ./smtpc2 -t 100 -s 1000 -r v-bsd.perf-lab
    3.04s real     0.81s user     0.59s system

source: s-6.perf-lab

sink: mon.perf-lab (FreeBSD 4.9)

with -C

   12.05s real     1.04s user     0.67s system

without -C

    3.03s real     0.92s user     0.54s system

2004-03-04 source: s-6.perf-lab; sink: v-sun.perf-lab

with -C: 20s - 24s (UFS) Note: It takes 20s(!) to remove all CDB files:

time rm ?/S*     0m20.11s
with -C: 1s (TMPFS); 16s (UFS, /), rm: 14s; logging turned on: 16s, rm: 0.8s.

without -C: 1s

2004-03-08 source: s-6.perf-lab; sink: v-bsd;

./smtpc -t 100 -s 1000

sink program time (s)
smtpss 30
smtps3 -C 30
smtps3 3

2004-03-08 source: s-6.perf-lab; sink: v-sun;

./smtpc -t 100 -s 1000

sink program FS times (s)
smtps3 - 1
smtpss UFS 25, 30
smtps3 -C UFS 23
smtpss swap 2, 3
smtps3 -C swap 1, 2

Note: the variance for smtpss on UFS is fairly large. The lower numbers are achieved by running smtps3 -C first and then smtpss, the larger numbers are measured when the CDB files have just been removed. However, this effect was not reproduceable. Note: removing those files takes about as long as a test run.


SMTP Relaying Using a Sendmail X Prototype

Test setup with a sendmail X prototype of 2002-09-04: v-aix.perf-lab running QMGR, SMTPS, and SMTPC. Relaying from localhost to v-bsd.perf-lab. Source program running on v-aix:

time ./smtp-source -s 50 -m 100 -c localhost:8000

Using the full version: 2.45s; turning fsync() off: 1.44s.

This clearly shows the need for a better CDB implementation, at least on AIX.

Same test with reversed roles (smX on v-bsd, sink on v-aix): using the full version: 7.44s; turning fsync() off: 6.20s. For comparison: using sendmail 8.12: 14.71s.

The SCSI disks on v-bsd seem to be fairly slow. Moreover, there seems to be something wrong with the OS version (it's very old: FreeBSD 3.4).

On FreeBSD 4.6 (machine 14, see Section 5.2.1.1) (source, sink, sm-9 of 2002-10-01 on the same machine):

time ./smtp-source -s 100 -m 200 -c localhost:8000

softupdates: 4.35s; without softupdates: 5.66s

time ./smtp-source -s 50 -m 100 -c localhost:8000

softupdates: 2.01s/1.93s, -U: 1.79s; without softupdates: 2.60s/2.46s, -U: 2.17s

(-U turns off fsync()).

Using sendmail 8.12.6:

time ./smtp-source -s 50 -m 100 localhost:1234

softupdates: 5.01s. This looks quite good for sendmail 8, but the result for:

time ./smtp-source -c -s 100 -m 200 localhost:1234

is: 143.12s, which certainly is not anywhere near good. This is related to the high load generated by this: up to 200 concurrent sendmail processes just kill the machine. sendmail X has only up to 4 processes running.


Various Linux FS

Test date: 2003-05-25, version: smX.0.0.6, machine: PC, AMD Duron 700MHz, 512MB RAM, SuSE 8.1

Test program:

time ./smtp-source -s 50 -m 500 -fa@b.c -tx@y.z localhost:1234

FS Times msg/s (best)
JFS 4.02s, 4.23s 124
ReiserFS 4.8s 104
XFS 6.7s, 7.2s, 7.48s, 7.64s 74
EXT3 14.39s, 13.44s 34

2004-03-17 checks/t-readwrite on destiny (Linux, IDE, ext2):

parameters writes time
-s -f 1000 -p 1 - 9
-s -f 100 -p 10 - 6

The FS is mounted async (default!).

2004-03-17 checks/t-readwrite on ia64-2 (Linux, SCSI, reiserfs):

parameters writes time
-s -f 1000 -p 1 - 5.2
-s -f 100 -p 10 - 2.6

2004-03-23 source: basil.ps-lab MTA: cilantro.ps-lab (Linux 2.4.18-64GB-SMP) sink: v-sun.perf-lab

FS: ReiserFS version 3.6.25

smtpc -t 100 -s 1000

program source time sink time
smtps3 -C   -
smX.0.0.12 6 5
sm8.12.11 74 74
sm8.12.11 See 1   50
postfix 2.0.18    

gatling -m 100 -c 5000 -z 1 -Z 1

program writes source time source msgs/s sink time
smtps3   2 2295 -
smtps3 -C   5 962 -
smX.0.0.12   22 225 22
sm8.12.11   358 14 358
sm8.12.11 See 1   246 20 -
postfix 2.0.18        

Notes:

  1. Default for Linux is to have REQUIRES_DIR_FSYNC set, in this test it has been turned off. Some people claim it is safe to do that with recent Linux FSs. For some reasons (timeouts?) the tests with smtpc fail in this configuration, i.e., less than 1000 messages are sent.
  2. According to tests by Thom sendmail 8.12 was able to relay 40 msgs/s on the same machine.

2004-03-25:

Filesystems:

  1. ext3 (rw,sync,data=journal)
  2. ext3 (rw,data=journal) [this means async?]
  3. reiserfs (rw,noatime,data=journal,notail)
  4. jfs (rw)
  5. ext2 (rw,sync)

smtpc -t 100 -s 1000

program FS source time sink time
smX.0.0.12 1 63 61
  1 63 63
  2 19 18
  3 5 4
  3 5 5
  5 81 80
sm8.12.11 3 45 several read errors
  5 91 92
smtps3 -C      

2004-03-25: gatling -m 100 -c 5000 -z 1 -Z 1 (1KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2 90 90 55
  3 24 24 208
  4 100 99 100
sm8.12.11 3 216 errors 23

gatling -m 100 -c 5000 -z 4 -Z 4 (4KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2 92 92 54
  3 141 140 35
  4 168 168 29
sm8.12.11 3 226 errors 22

gatling -m 100 -c 5000 -z 16 -Z 16 (16KB message size)

program FS source time sink time msgs/s
smX.0.0.12 1      
  2      
  3 169   29
  4      
sm8.12.11 3 226 errors 22

Notes:

  1. ReiserFS seems to have some optimizations for small files, hence the results for 1KB are really good, but for 4KB they are in the normal range.
  2. Testing with sm8 usually caused several read errors on the sink side and several errors displayed by gatling.


Various FreeBSD Results

2003-11-19 sm-9.0.0.9 running on v-bsd.perf-lab (2 processors, FreeBSD 3.4)

Source on bsd.dev-lab

time ./smtp-source -d -s 100 -m 500

directly to sink: 2.16 - 2.74s (231msgs/s)

using MFS: 14.37 - 14.43s (34msgs/s) (sm8.12.10: 32s)

using FS with softupdates: 22.78 - 23.83s (21msgs/s) (sm8.12.10: 49s)

using FS without softupdates: 35.27 - 35.56s (14msgs/s)

2004-03-02 source: s-6.perf-lab; relay: mon; sink: v-bsd

time ./smtpc2 -O 10 -fa@s-6.perf-lab -Rnobody@v-bsd.perf-lab -t 100 -s 1000 -r mon.perf-lab:1234

38.26s real 1.01s user 0.88s system

2004-03-04 source: s-6.perf-lab; relay: v-bsd; sink: v-sun

options: -t 100 -s 1000

MTA source time(s) sink time
postfix 2.0.18 53 94
smX.0.0.12 69 68
without smtpc 56 -
sm8.12.11 67 67
-odq 79, 82  
-odq / 100 qd 101  
-odq / 10 qd 100  

Note: this is FreeBSD 3.4 without softupdates and directory hashes.

getrusage(2) data:

sm8.12.11 -odq

ru_utime=        15.0158488
ru_stime=        71.0104605
ru_maxrss=     1524
ru_ixrss=   5030592
ru_idrss=   4098456
ru_isrss=   1412096
ru_minflt=   127503
ru_majflt=        0
ru_nswap=         0
ru_inblock=       0
ru_oublock=   11851
ru_msgsnd=    13000
ru_msgrcv=    10000
ru_nsignals=      0
ru_nvcsw=    617469
ru_nivcsw=    18793

sm8.12.11

ru_utime=        15.0236311
ru_stime=        62.0117941
ru_maxrss=     1520
ru_ixrss=   4573224
ru_idrss=   3676784
ru_isrss=   1283712
ru_minflt=   174619
ru_majflt=        0
ru_nswap=         0
ru_inblock=       0
ru_oublock=    4001
ru_msgsnd=    12000
ru_msgrcv=    10000
ru_nsignals=   1000
ru_nvcsw=    128074
ru_nivcsw=    14771

This looks like a problem in queue only mode: there's way too much data written: almost 3 times the amount of background delivery mode. Why does sm8 send 1000 more message in queue only mode?

2004-03-05 source, relay, sink: wiz (FreeBSD 4.8)

options: -t 100 -s 1000

source: 34s, sink: 32s

turn off smtpc: source: 31s, 34s

2004-03-26 source: v-6.perf-lab running smtpc -t 100 -s 5000; relay: v-bsd.perf-lab; sink: v-sun.perf-lab

sink runs smtps2 -R n with varying values for n

n source time requests served
0 108 5000
8000000 115 5060
58000000 140 5450
88000000 151 5620

put defedb on a RAM disk:

n source time requests served
0 108 5000
8000000    
58000000 111 5453
88000000 114 5693

Obviously the additional disk I/O traffic created by having to use DEFEDB is slowing down the system.


FreeBSD 4.9, Softupdates, and fsync()

2004-06-23 Upgraded v-bsd.perf-lab to FreeBSD 4.9 (2 processors), using softupdates.

source on v-sun, sink on s-6:

time ./smtpc2 -O 10 -t 100 -s 1000 -r v-bsd.perf-lab:1234

43s

turn off fsync(): (smtps -U, must be compiled with -DTESTING)

32s


Disk I/O On FreeBSD

A modified iostat(8) program is used to show the number of bytes written and read, and the number of read, write, and other disk I/O operations.

The following tests were performed: sink (smtps3) on v-bsd.perf-lab, source (smtpc) on s-6.perf-lab sending 1000 mails. All numbers for write operations are rounded; if there are numbers in parentheses then those denote the value of ru_oublock (getrusage(2)) for smtps/qmgr or sm8. If two times are given (separated by /) then the second time denotes the output (elapsed time) for the sink.

program softupdates? writes reads time
smtps3 -C yes 2200 - 14
smtps3 -C no 2900 - 30
smX.0.0.12, no sched (see 1) yes 5200 - 34
smX.0.0.12, no sched yes   -  
smX.0.0.12, no sched no   -  
smX.0.0.12 (see 2) yes 3500 (2000/1300) 4 33
  yes 3370 (2020/1270) 4 30/29
-O i=1000000 yes 2660 (1850/660) 0 25/24
smX.0.0.12 no 6300 (3000/3200) 0 52
smX.0.0.12 (see 4) yes 3500 (2200/1200) 4 25
sm8.12.11 -odq SS=m yes 1800 - 41
sm8.12.11 -odq SS=m no 12200 - 72
sm8.12.11 SS=m (see 3) yes 236 (164) 0 61
  yes 370 (218) 0 60
sm8.12.11 no 8100 (4100) 1 63
sm8.12.11 SS=t yes 7400 0 70
postfix 2.0.18 yes 2900 16 21/26

Notes:

  1. Question: why does the smX.0.0.12 use so many write operations? 5200 is way too much. Answer: qmgr committed IBDB more than 1000 times5.1, increasing the maximum time to acknowledge an SMTPS transaction from 100 $\mu$s to 10000 $\mu$s reduces the number of commits to 165.
  2. Question: why does qmgr still (after increasing the time between commits) perform so many write operations?
  3. Question: why does sm8 use so few writes? Can softupdates eliminate or cluster most writes? Why doesn't this work for smX? Solution: SuperSafe was set to m, not to true.
  4. If IBDB and CDB are on different partitions, the performance increases significantly (about 25 per cent faster).

2004-03-23 source: basil.ps-lab MTA: wasabi.ps-lab (FreeBSD 4.9, machine 16 in Section 5.2.1.1) sink: v-sun.perf-lab

smtpc -t 100 -s 1000

program writes reads source time sink time
smtps3 -C 2400 - 11 -
smX.0.0.12 2600 5 15 13
sm8.12.11 6000 1 35  
postfix 2.0.18 2800 15 14 20

Note: the sink time for postfix is shorter than the time for smX because smX emptied the queue during the run while postfix has more than 700 entries in the mail queue after the source finished sending all mails. This can be seen by looking at the sink time which is noticeable larger for postfix compared to sendmail X.

Using gatling:

Max random envelope rcpts:  1
Connections:                100
Max msgs/conn:              Unlimited
Messages:                   Fixed size 1 Kbytes
Desired Message Rate:       Unlimited
Total messages:             5000

Total test elapsed time: 73.571 seconds (1:13.570)
Overall message rate: 67.962 msg/sec
Peak rate: 100.000 msg/sec

gatling -m 100 -c 5000 -z 1 -Z 1

program writes source time source msgs/s sink time
smtps3 0 5 980 -
smtps3 -C 11750 53 93  
smX.0.0.12   73 67 71
smX.0.0.12 11157 (8000/2700) 70 71 69
sm8.12.11   136 36  
postfix 2.0.18   60 83 78
postfix 2.0.18 12635 58 85 75

2004-03-16 results for wiz: source: time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: smtps3, file system: UFS, softupdates

parameters oublock writes source time sink time
-C -i 1920 ? 17 16
-C -p 1 1860 ? 17 17
-C -p 1 1940 2700 16 15
-C -p 1 1970 2770 16 15
-C -p 2   ? 15 ?
-C -p 2 877+966 2600 15 ?
-C -p 4 455+476+432+472 2640 15 ?

New option: -f for flat, i.e., instead of using 16 subdirectories for CDB files, a single directory is used. Even though this does not cause a noticeable difference in run time, the number of I/O operations is reduced.

parameters oublock writes source time
-C -p 2 915+920 2600 14
-C -p 2 -f 600+610 2200 14

2004-03-16 source: s-6.perf-lab, time ./smtpc -s 1000 -t 100 -r localhost:1234; sink: -v-bsd.perf-lab, smtps3, file system: UFS, softupdates

parameters oublock writes source time sink time
-C -i 1430 2165 12 11
  1550 2300 14 13
-C -p 1 1500 2500 14 12
-C -p 2 1100+620 2500 13 -
  800+770 2320 13 -
-C -p 4 530+350+540+470 2600 13 -

Note: some of the write operations might be from softupdates due to the previous rm command (removing the CDB files).

2004-03-17 checks/t-readwrite on v-bsd (FreeBSD 4.9, SCSI):

parameters softupdates oublock writes time
-s -f 1000 -p 1 yes 4000 4000 22
-s -f 100 -p 10 yes 2575 2579 14
-s -f 1000 -p 1 no 4050 4050 28
-s -f 100 -p 10 no 4050 4050 27

-p specifies the number of processes to start, -f specifies the number of files to write per process. The test cases above write 1000 files with either 1 or 10 processes. As it can be seen, it is significantly more efficient to use 10 processes if softupdates are turned on.

2004-03-17 checks/t-readwrite on wiz (FreeBSD 4.8, IDE):

parameters softupdates oublock writes time
-s -f 1000 -p 1 yes 3000 3800 13
-s -f 100 -p 10 yes 2860 3600 13

In this case no difference can be seen, which is most likely a result of using an IDE drive with write-caching turned on (default).


Various SunOS 5 Results

2003-11-21 sm-9.0.0.9 running on v-sun.perf-lab

Source on bsd.dev-lab

time ./smtp-source -d -s 100 -m 5000 -c

using FS: 301.90 - 305.02s (16msgs/s)

using swap: 77.98 - 78.55s (64msgs/s)

Those tests ran only 32 SMTPS threads (the machine has 4 CPUs, hence the specified limit 128 was divided by 4). Using 128 SMTPS threads (by forcing only one process which was used anyway because SMTPS is run with the interactive option which does not start backgroup processes):

time ./smtp-source -d -s 100 -m 50000 -c

using swap: 727.73s (68msgs/s)

2004-03-09 sm-9.0.0.12 running on v-sun.perf-lab

time ./smtpc -O 20 -fa@s-6.perf-lab.sendmail.com -Rnobody@v-bsd.perf-lab.sendmail.com -t 100 -s 1000 -r v-sun.perf-lab.sendmail.com:1234

MTA options FS source time(s) sink time(s)
full MTS SWAPFS 16 14
without sched SWAPFS 10 -
smtpss SWAPFS 3 -
full MTS UFS 64, 65, 64 75, 70, 69
8.12.11 SWAPFS 16 19
8.12.11 UFS 141 138

Note: smX using UFS runs into connection limitations: QMGR believes there are 100 open connections even though the sink shows at most 18. This seems to be a communication latency between SMTPC and QMGR (and needs to be investigated further).

2004-03-17 checks/t-readwrite on v-sun (SunOS 5.8, SCSI):

parameters writes time
-s -f 1000 -p 1 - 39
-s -f 100 -p 10 - 37

The filesystem on SunOS 5.8 does not cause any difference whether 1 or 10 processes are used.


Various OpenBSD Results

2004-03-05 source, relay, and sink on zardoc (OpenBSD 3.2)

test with logging via smioout

zardoc$ time ./smtpc2 -O 10 -s 1000 -t 100 -r localhost:1234
   24.17s real     0.94s user     2.57s system

smtps3 stats:

elapsed                    26
Thread limits (min/max)    8/256
Waiting threads            8
Max busy threads           3
Requests served            1000

Note that there have been only 3 active threads. That means the client is not busy at all. Another test shows elapsed=23s, max busy threads=21, so the result isn't deterministic (the machine is running as normal SMTP server etc during tests).

test with logging via smioerr: smtpc2: 24.53s; no difference.


Various AIX Results

2004-03-17 checks/t-readwrite on aix-3 (AIX 4.3, SCSI, jfs):

parameters writes time
-s -f 1000 -p 1 - 30
-s -f 100 -p 10 - 29

No (noticeable) difference.


Implementation of Queues and Caches


Filesystem Performance

Here are some results of a simple test program which creates and deletes a number of files and optionally renames them twice while doing so.

Notice: unless mentioned otherwise, all measurements are at most accurate to one second resolution. Repeated test will most likely show (slightly) different results. These tests are only listed to give an idea of the magnitude of available performance.


Test Systems

The involved systems are:

  1. PC, Pentium III, 500MHz, 256MB RAM, FreeBSD 3.2,
    wdc0: unit 0 (wd0): <FUJITSU MPD3064AT>
    wd0: 6187MB (12672450 sectors), 13410 cyls, 15 heads, 63 S/T, 512 B/S
    

  2. PC, AMD K6-2, 450MHz, 220MB RAM, OpenBSD 2.8,
    1. wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-351010>
      wd0: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4
      wd0: 16-sector PIO, LBA, 9671MB, 16383 cyl, 16 head, 63 sec, 19807200 sectors
      

    2. wd1 at pciide0 channel 0 drive 1: <Maxtor 98196H8>,
      wd1: can use 32-bit, PIO mode 4, DMA mode 2, Ultra-DMA mode 4,
      wd1: 16-sector PIO, LBA, 78167MB, 16383 cyl, 16 head, 63 sec, 160086528 sectors
      

  3. PC, Pentium III, 500MHz, 256MB RAM, FreeBSD 4.4-STABLE,
    ad0: 6187MB <FUJITSU MPC3064AT> [13410/15/63] at ata0-master UDMA33
    

  4. PC, AMD-K7, 500MHz, FreeBSD 4.4-STABLE, 332MB RAM,
    1. ahc0: <Adaptec 2940 Ultra2 SCSI adapter (OEM)>
      da0: <IBM DNES-309170W SA30> Fixed Direct Access SCSI-3 device
      da0: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
      da0: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
      
      1. SCSI with softupdates
      2. SCSI without softupdates

    2. ad0: 8063MB <FUJITSU MPD3084AT> [16383/16/63] at ata0-master UDMA66
      
      softupdates

  5. PC, Linux 2.2.12,
    hda: IBM-DJNA-370910, 8693MB w/1966kB Cache, CHS=1108/255/63
    
    ext 2 FS

  6. PC, Linux 2.4.7,
    hda: 39102336 sectors (20020 MB) w/2048KiB Cache, CHS=2434/255/63, UDMA(66)
    reiserfs: using 3.5.x disk format
    ReiserFS version 3.6.25
    

  7. Dec Digitial Alpha, OSF/1, SCSI disk?

  8. Sun SPARC, SunOS 5.6, SCSI disk?

  9. Sun SPARC, SunOS 5.7, SCSI disk?
    1. mount options: no logging, atime
    2. mount options: logging, atime
    3. mount options: logging, noatime

  10. Sun SPARC E450, 4 CPUs,
    1. Baydel RAID

    2. SCSI disk

  11. AIX 4.3.3, using JFS (default).

  12. PC, AMD K7, 1000MHz, 512MB RAM, SuSE 7.3, kernel 2.4.10
    WD1200BB
    hdg: 234441648 sectors (120034 MB) w/2048KiB Cache, CHS=232581/16/63, UDMA(100)
    
    1. /home jfs
    2. /opt reiserfs
    3. /work ext3

  13. HP-UX 11.00

  14. PC, Pentium II, 360MHz, 512MB RAM, FreeBSD 4.6,
    ad0: 8693MB <IBM-DJNA-370910> [17662/16/63] at ata0-master UDMA33
    acd0: CDROM <CD-ROM 40X> at ata1-master PIO4
    

  15. Intel IA64, 4 CPUs, 1GB RAM
    scsi0 : ioc0: LSI53C1030, FwRev=01000000h, Ports=1, MaxQ=255, IRQ=52
      Vendor: MAXTOR    Model: ATLASU320_18_SCA  Rev: B120
      Type:   Direct-Access                      ANSI SCSI revision: 03
    Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
    SCSI device sda: 35916548 512-byte hdwr sectors (18389 MB)
    reiserfs: found format "3.6" with standard journal
    reiserfs: using ordered data mode
    Using r5 hash to sort names
    

  16. Intel Pentium III, 650 MHz, 256MB RAM
    da0 at ahc0 bus 0 target 0 lun 0
    da0: <SEAGATE ST39175LW 0001> Fixed Direct Access SCSI-2 device 
    da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled
    da0: 8683MB (17783240 512 byte sectors: 255H 63S/T 1106C)
    

  17. PC, Pentium III, 450MHz, 256MB RAM, FreeBSD 4.8, softupdates
    ad0: 6187MB <FUJITSU MPD3064AT> [13410/15/63] at ata0-master UDMA33
    

  18. PC, FreeBSD 4.10, softupdates
    da3 at ahc0 bus 0 target 4 lun 0
    da3: <IBM DNES-309170Y SA30> Fixed Direct Access SCSI-3 device 
    da3: 40.000MB/s transfers (20.000MHz, offset 31, 16bit), Tagged Queueing Enabled
    da3: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C)
    

  19. PC, VIA C3, 667MHz, 256MB RAM, OpenBSD 3.2,
    1. wd0 at pciide0 channel 0 drive 0: <IBM-DJNA-371350>
      wd0: 16-sector PIO, LBA, 12949MB, 16383 cyl, 16 head, 63 sec, 26520480 sectors
      

    2. wd1 at pciide0 channel 0 drive 1: <WDC WD1200BB-53CAA0>
      wd1: 16-sector PIO, LBA, 114473MB, 16383 cyl, 16 head, 63 sec, 234441648 sectors
      

    3. wd2 at pciide1 channel 0 drive 0: <Maxtor 6Y160P0>
      wd2: 16-sector PIO, LBA48, 156334MB, 16383 cyl, 16 head, 63 sec, 320173056 sectors
      wd2(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 6
      


Meta Data Operations

In this section, some simple test programs are used that create some files, perform (sequential) read/write operations on them and remove them afterwards.

Entries in the following table are elapsed time in seconds (except for the first column which obviously refers to the machine description above). The program that has been used to produce these results is fsperf1.c.

machine 5000 100 -c 5000 100 -c -r 5000 100
1 50 49 48
1 42 48 51
2a 3 7 10
    about 2200 tps about 1500 tps

2b

  11 21
3 10 34 34
  about 500 tps    
4(a)i   126 125
4(a)ii   208 454
4b   43 48
7 7 13 16
5 9 8 9
8 133 201 603
9a   52 665
10a 9 9 12

11

89 139 233

Comments:

(2004-07-14) With and without fsync(2) (-S)

common parameters machine -c -c -r -S -c -S -c -r
(5000 100) 17 42 42 2 3
  10b 165 496 165 495
  18 83 83 5 8
  19a 8 7 1 3
  19b 8 9 1 3
  19c 7 9 1 2
(-s 32 5000 100) 17 109 109 8 9
  10b 250 537 207 498
  18 114 113 14 16
  19b 87 81 3 5
  19c 26 26 4 5

Comments:

Next version: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks.

machine -h 1 -c 1000 1000 -h 1 -c -r 1000 1000
1 18 18
2a 24 24
2b 7 9
3 14 14
4(a)i 23 23
4(a)ii 33 77
4b 25 49
5 3 2
7 3 4
8 58 163
9a 51 139
11 28 48

Comments:


Meta Data Operations: Existing Files

Next version fsperf1.c: allow for hashing (00 - 99, up to two levels). Use enough files to defeat the (2MB) cache of IDE disks. The parameters for the following table are 1000 operations and 1000 files, hence each file is used once. Additional parameters are listed in the heading. c: create, h 1: one level hashing, r: rename file twice, p: populate directories before test, then just reuse the files.

machine -h 1 -c -h 1 -c -r -p -h 1 -c -p -h 1 -c -r
1 32 31 18 17
2a 18 18 9 10
2b 10 10 8 10
5 2 1 2 1
6 2 2 4 4
7 2 4 2 3
8 58 165 78 178
9a 27 127 33 131
9c 13 51 37 55
11 28 48 28 48

Comments:


Writing a Logfile

Another test program (fsseq1.c) writes lines to a file and uses fsync(2) after a specified number (-C parameter).

20000 entries (10000 entries each for received/delivered, total 490000 bytes).

machine - -C 100 -C 50 -C 10 -C 5 -C 2 -f
1 1 4 6 17 32 78 150
2a 0 2 2 5 5 9 18
2b 1 0 1 3 4 10 20
3 1 2 3 9 16 37 68
5 1 1 2 6 12 27 56
7 0 4 8 39 79 198 410
8 1 7 13 60 120 299 598
9a 1 8 13 15 62 90 140
11 0 6 12 53 106 262 518

This clearly demonstrates the need for group commits. However, the program requires a lot of CPU since each line is generated by snprintf(). Hence the full I/O speed may not be reached. To confirm this, another program (fsseq2.c) is used that just writes a buffer with a fixed content to a file.

The following table lists the results for group commits (C) together with various buffer sizes (256, 1024, 4096, 8192, and 16384). As usual the entries are execution time in seconds. The program writes 2000 records in total, e.g., for size 16384 that is 31MB data.

machine C 256 1024 4096 8192 16384  
5 1 4 5 10 20 34  
  2 2 4 6 12 22  
  5 1 2 5 7 15  
  10 1 1 3 6 12  
  50 1 0 3 5 10  
  100 0 1 3 5 10  
7 1 1 5 20 40 44  
  2 1 5 11 23 29  
  5 1 5 9 12 13  
  10 1 2 3 6 7  
  50 0 1 1 2 3  
  100 0 1 1 1 3  
8 1 3 10 45 95 109  
  2 2 11 23 52 59  
  5 3 11 19 24 32  
  10 2 5 6 15 21  
  50 1 2 3 8 13  
  100 0 1 3 6 13  
9a 1 3 12 34 35 58  
  2 3 12 18 53 53  
  5 3 6 21 23 24  
  10 3 5 6 13 14  
  50 1 2 2 5 7  
  100 1 1 2 3 6  
11 1 21 35 77 83 92  
  2 13 26 38 45 50  
  5 8 13 17 20 24  
  10 5 6 10 11 15  
  50 1 2 2 4 7  
  100 1 1 2 3 6  

Comments:

Yet another program (fsseq3.c) uses write() instead of fwrite(). This time the tests write 40000KB each, which makes it simpler to determine the throughput.

Note: as usual, these times are not very accurate (1s resolution), and hence the rate is inaccurate too. Machines:

1
C s records time KB/s
1 512 80000 1365 29
1 1024 40000 734 54
1 2048 20000 451 88
1 4096 10000 352 113
1 8192 5000 250 160
2 512 80000 736 54
2 1024 40000 453 88
2 2048 20000 354 112
2 4096 10000 382 104
2 8192 5000 225 177
5 512 80000 638 62
5 1024 40000 585 68
5 2048 20000 312 128
5 4096 10000 187 213
5 8192 5000 101 396
10 512 80000 561 71
10 1024 40000 296 135
10 2048 20000 161 248
10 4096 10000 88 454
10 8192 5000 60 666
50 512 80000 128 312
50 1024 40000 70 571
50 2048 20000 41 975
50 4096 10000 34 1176
50 8192 5000 29 1379
100 512 80000 73 547
100 1024 40000 43 930
100 2048 20000 33 1212
100 4096 10000 28 1428
100 8192 5000 27 1481
2b
C s records time KB/s
1 512 80000 165 242
1 1024 40000 90 444
1 2048 20000 54 740
1 4096 10000 28 1428
1 8192 5000 16 2500
2 512 80000 94 425
2 1024 40000 52 769
2 2048 20000 30 1333
2 4096 10000 17 2352
2 8192 5000 11 3636
5 512 80000 54 740
5 1024 40000 33 1212
5 2048 20000 19 2105
5 4096 10000 11 3636
5 8192 5000 8 5000
10 512 80000 31 1290
10 1024 40000 18 2222
10 2048 20000 11 3636
10 4096 10000 8 5000
10 8192 5000 6 6666
50 512 80000 11 3636
50 1024 40000 8 5000
50 2048 20000 6 6666
50 4096 10000 5 8000
50 8192 5000 4 10000
100 512 80000 10 4000
100 1024 40000 8 5000
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 5 8000

5
C s records time KB/s
1 512 80000 13440 2
1 1024 40000 6790 5
1 2048 20000 3451 11
1 4096 10000 1779 22
1 8192 5000 1007 39
2 512 80000 6790 5
2 1024 40000 3439 11
2 2048 20000 1763 22
2 4096 10000 909 44
2 8192 5000 471 84
5 512 80000 2763 14
5 1024 40000 1414 28
5 2048 20000 739 54
5 4096 10000 383 104
5 8192 5000 208 192
10 512 80000 1414 28
10 1024 40000 731 54
10 2048 20000 384 104
10 4096 10000 208 192
10 8192 5000 120 333
50 512 80000 312 128
50 1024 40000 174 229
50 2048 20000 101 396
50 4096 10000 64 625
50 8192 5000 46 869
100 512 80000 171 233
100 1024 40000 100 400
100 2048 20000 64 625
100 4096 10000 46 869
100 8192 5000 37 1081
6
C s records time KB/s
1 512 80000 130 307
1 1024 40000 93 430
1 2048 20000 78 512
1 4096 10000 23 1739
1 8192 5000 12 3333
2 512 80000 62 645
2 1024 40000 46 869
2 2048 20000 24 1666
2 4096 10000 13 3076
2 8192 5000 15 2666
5 512 80000 66 606
5 1024 40000 31 1290
5 2048 20000 18 2222
5 4096 10000 15 2666
5 8192 5000 10 4000
10 512 80000 28 1428
10 1024 40000 19 2105
10 2048 20000 13 3076
10 4096 10000 10 4000
10 8192 5000 10 4000
50 512 80000 14 2857
50 1024 40000 10 4000
50 2048 20000 10 4000
50 4096 10000 9 4444
50 8192 5000 7 5714
100 512 80000 11 3636
100 1024 40000 10 4000
100 2048 20000 8 5000
100 4096 10000 8 5000
100 8192 5000 8 5000

7
C s records time KB/s
1 512 80000 3347 11
1 1024 40000 1689 23
1 2048 20000 845 47
1 4096 10000 418 95
1 8192 5000 192 208
2 512 80000 1243 32
2 1024 40000 796 50
2 2048 20000 431 92
2 4096 10000 222 180
2 8192 5000 122 327
5 512 80000 655 61
5 1024 40000 268 149
5 2048 20000 161 248
5 4096 10000 108 370
5 8192 5000 58 689
10 512 80000 355 112
10 1024 40000 185 216
10 2048 20000 85 470
10 4096 10000 42 952
10 8192 5000 38 1052
50 512 80000 88 454
50 1024 40000 49 816
50 2048 20000 31 1290
50 4096 10000 18 2222
50 8192 5000 10 4000
100 512 80000 45 888
100 1024 40000 33 1212
100 2048 20000 19 2105
100 4096 10000 14 2857
100 8192 5000 14 2857
8
C s records time KB/s
1 512 80000 6302 6
1 1024 40000 3220 12
1 2048 20000 1695 23
1 4096 10000 949 42
1 8192 5000 552 72
2 512 80000 3183 12
2 1024 40000 1708 23
2 2048 20000 950 42
2 4096 10000 484 82
2 8192 5000 299 133
5 512 80000 1402 28
5 1024 40000 805 49
5 2048 20000 440 90
5 4096 10000 252 158
5 8192 5000 137 291
10 512 80000 783 51
10 1024 40000 395 101
10 2048 20000 211 189
10 4096 10000 122 327
10 8192 5000 87 459
50 512 80000 181 220
50 1024 40000 107 373
50 2048 20000 68 588
50 4096 10000 49 816
50 8192 5000 42 952
100 512 80000 111 360
100 1024 40000 70 571
100 2048 20000 50 800
100 4096 10000 40 1000
100 8192 5000 36 1111

9a
C s records time KB/s
1 512 80000 2638 15
1 1024 40000 1419 28
1 2048 20000 753 53
1 4096 10000 442 90
1 8192 5000 221 180
2 512 80000 1379 29
2 1024 40000 774 51
2 2048 20000 409 97
2 4096 10000 220 181
2 8192 5000 124 322
5 512 80000 644 62
5 1024 40000 382 104
5 2048 20000 198 202
5 4096 10000 105 380
5 8192 5000 58 689
10 512 80000 355 112
10 1024 40000 196 204
10 2048 20000 104 384
10 4096 10000 59 677
10 8192 5000 32 1250
50 512 80000 90 444
50 1024 40000 51 784
50 2048 20000 28 1428
50 4096 10000 19 2105
50 8192 5000 15 2666
100 512 80000 54 740
100 1024 40000 28 1428
100 2048 20000 20 2000
100 4096 10000 15 2666
100 8192 5000 14 2857
9b
C s records time KB/s
1 512 80000 2642 15
1 1024 40000 1312 30
1 2048 20000 723 55
1 4096 10000 376 106
1 8192 5000 185 216
2 512 80000 1363 29
2 1024 40000 699 57
2 2048 20000 359 111
2 4096 10000 185 216
2 8192 5000 104 384
5 512 80000 563 71
5 1024 40000 302 132
5 2048 20000 162 246
5 4096 10000 88 454
5 8192 5000 46 869
10 512 80000 299 133
10 1024 40000 161 248
10 2048 20000 87 459
10 4096 10000 46 869
10 8192 5000 24 1666
50 512 80000 81 493
50 1024 40000 44 909
50 2048 20000 35 1142
50 4096 10000 19 2105
50 8192 5000 13 3076
100 512 80000 51 784
100 1024 40000 35 1142
100 2048 20000 26 1538
100 4096 10000 15 2666
100 8192 5000 13 3076

9c
C s records time KB/s
1 512 80000 2576 15
1 1024 40000 1326 30
1 2048 20000 707 56
1 4096 10000 377 106
1 8192 5000 192 208
2 512 80000 1324 30
2 1024 40000 685 58
2 2048 20000 349 114
2 4096 10000 187 213
2 8192 5000 107 373
5 512 80000 578 69
5 1024 40000 313 127
5 2048 20000 163 245
5 4096 10000 89 449
5 8192 5000 46 869
10 512 80000 306 130
10 1024 40000 162 246
10 2048 20000 86 465
10 4096 10000 46 869
10 8192 5000 25 1600
50 512 80000 82 487
50 1024 40000 44 909
50 2048 20000 33 1212
50 4096 10000 19 2105
50 8192 5000 13 3076
100 512 80000 52 769
100 1024 40000 36 1111
100 2048 20000 25 1600
100 4096 10000 16 2500
100 8192 5000 13 3076
12a
C s records time KB/s
1 512 80000 65 615
1 1024 40000 61 655
1 2048 20000 59 677
1 4096 10000 5 8000
1 8192 5000 4 10000
2 512 80000 13 3076
2 1024 40000 8 5000
2 2048 20000 4 10000
2 4096 10000 4 10000
2 8192 5000 3 13333
5 512 80000 44 909
5 1024 40000 21 1904
5 2048 20000 13 3076
5 4096 10000 3 13333
5 8192 5000 3 13333
10 512 80000 12 3333
10 1024 40000 3 13333
10 2048 20000 3 13333
10 4096 10000 3 13333
10 8192 5000 5 8000
50 512 80000 11 3636
50 1024 40000 3 13333
50 2048 20000 5 8000
50 4096 10000 5 8000
50 8192 5000 4 10000
100 512 80000 5 8000
100 1024 40000 5 8000
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 3 13333

12b
C s records time KB/s
1 512 80000 124 322
1 1024 40000 87 459
1 2048 20000 72 555
1 4096 10000 20 2000
1 8192 5000 10 4000
2 512 80000 47 851
2 1024 40000 32 1250
2 2048 20000 16 2500
2 4096 10000 8 5000
2 8192 5000 5 8000
5 512 80000 56 714
5 1024 40000 27 1481
5 2048 20000 20 2000
5 4096 10000 5 8000
5 8192 5000 5 8000
10 512 80000 23 1739
10 1024 40000 17 2352
10 2048 20000 6 6666
10 4096 10000 3 13333
10 8192 5000 6 6666
50 512 80000 7 5714
50 1024 40000 4 10000
50 2048 20000 6 6666
50 4096 10000 6 6666
50 8192 5000 4 10000
100 512 80000 7 5714
100 1024 40000 6 6666
100 2048 20000 5 8000
100 4096 10000 4 10000
100 8192 5000 3 13333
12c
C s records time KB/s
1 512 80000 205 195
1 1024 40000 144 277
1 2048 20000 122 327
1 4096 10000 14 2857
1 8192 5000 7 5714
2 512 80000 34 1176
2 1024 40000 22