This chapter describes the architecture of sendmail X. It presents some possible design choices for various parts of sendmail X and explains why a particular choice has been made. Notice: several decisions haven't been made yet, there are currently a lot of open questions.
sendmail X consists of several communicating modules. A strict separation of functionality allows for a flexible, maintainable, and scalable program. It also enhances security by running only those parts with special privileges (e.g., root, which will be used as a synonym for the required privileges in this text) that really require it.
Some terms relevant for e-mail are explained in a glossary 2.17.
sendmail X consists of the following modules:
sendmail X uses persistent databases for content (CDB) and for envelope (routing) information (EDB). The content DB is written by the SMTP servers only, and read by the delivery agents. The envelope DBs are under complete control of the queue manager.
There are other components for sendmail X, e.g., a recovery program that can reconstruct an EDB after a crash if necessary, a program to show the content of the mail queue (EDB), and at least hooks for status monitoring.
Since sendmail X is designed to have a lifetime of about one decade, it must not be tuned to specific bottlenecks in common computers as they are known now. For example, even though it seems common knowledge that disk I/O is the predominant bottleneck in MTAs, this isn't true in all cases. There is hardware support (e.g., disk system with non-volatile RAM) that eliminates this bottleneck2.1. Moreover, some system tests show that sendmail 8 is CPU bound on some platforms. Therefore the sendmail X design must be well-balanced and it must be easy to tune (or replace) subsystems that become bottlenecks in certain (hardware) configurations or situations.
This section contains some general remarks about configuring sendmail X. Todo: fill this in, add a new section later on that defines the configuration.
sendmail X must be easy enough to configure such that it does not require reading lots of files or even large section of a single file (see also Section 1.4). A ``default'' configuration may not require any configuration at all, i.e., the defaults should be stored in the binary and most of the required values should be automagically be determined at startup. A small configuration file might be necessary to override those defaults in case the system cannot determine the right values. Moreover, it is usually required to tell the MTS for which domain name to accept mail - by default a computer should have a FQDN but it is not advisable to decide to accept mail for the domain name itself2.2
The configuration file must be suitable for all kinds of administrators: at one end of the spectrum are those who just want to have an MTA installed and running with minimum effort, at the other end are those who want to tweak every detail of the system and maybe even enhance it by other software.
Only a few configuration options apply globally, many have
exceptions or suboptions that apply in specific situations.
For example, sendmail 8 has timeouts for most SMTP commands
and there are separate timeouts to return queued messages
for different precedence values.
Moreover, some features can be determined by rulesets,
some options apply on a per connection basis, etc.
In many cases it is useful to group configuration options together
instead of having those options very fine grained.
For examples, there are different SMTP mailers in sendmail 8
that create configuration groups (with some preselected set of options)
which can be selected via mailertable (or rules).
Instead of having mailer options per destination host (or other criteria),
different options are grouped together and then an option set is selected.
This can reduce the amount of configuration options that need to be stored
(e.g., it's a two level mapping:
address
mailer
mailer flags,
instead of just one level in which each argument
can have different function values:
address
mailer and mailer flags).
However, it might be complicated to actually structure options in a tree like manner. For example, a rewrite configuration option may be
Question: can we organize options into a tree structure? If not, how should we specify options and how should we implement them? Take the above example: there might be rewrite options per mailer and per address type (seems to make sense). However, in which order should those rewrite options be processed? Does that require yet another option?
A simple tree structure is not sufficient. For example, some option groups may share common suboptions, e.g., rewrite rules. Instead of having to specify them separately in each group, it makes more sense to refer to them. Here is an example from sendmail 8: there are several different SMTP mailers, but most of them share the same rewrite rulesets. In a strict tree structure each mailer would have a copy of the rewrite rulesets, which is neither efficient nor simple to maintain. Hence there must be something like ``subroutines'' which can be referenced. In a sendmail 8 configuration file this means there is a list of rulesets which can be referenced from various places, e.g., the binary (builtin ruleset numbers) and the mailers.
This means internally a configuration might be represented as a graph with references to various subconfigurations. However, this structure can be unfolded such that is actually looks like a tree. Hence, the configuration can conceptually be viewed as a tree.
There should be a way to query the system about the current configuration and to change (some) options on the fly. A possible interface could be similar to sysctl(8) in BSD. Here options are structured in a tree form with names consisting of categories and subscategories separated by dots, i.e., ``Management Information Base'' (MIB) style. Such names could be daemon.MTA.port, mailer.local.path, etc. If we can structure options into a tree as mentioned in the previous section then we can use this naming scheme. Whether it will be possible to change all parts on the fly is questionable, esp. since some changes must be done as transaction (all at once or none at all).
Each section of this chapter that describes a module of sendmail X has a subsection about security considerations for that particular part. More discussion can be found in Section 2.14.
This section gives an overview over the control and data flow for a typical situation, i.e., e-mail received via SMTP. This should give an idea how the various components interact. More details can be found in the appropriate sections.
Question: can we treat a configuration file like a programming language with
Definitions do not depend on anything else, they define the basic structure (and behavior?) of the system. There are fixed attributes which cannot be changed at runtime, e.g., port number, IP address to listen on. Attributes which can change at runtime, e.g., the hostname to use for a session, fall in category 3, i.e., they are functions which can determine a value at runtime.
The distinction between definitions and functions is largely determined by the implementation and the underlying operating system as well as the application protocol to implement and the underlying transport protocol. When defining an SMTP daemon (or a DA) some of its attributes must be fixed (defined/specified) in the configuration, these are called immutable. For example, it is not possible to dynamically change the port of the SMTP daemon because that's the way the OS call bind(2)2.4 works. However, the IP address of the daemon does not need to be fixed (within the capabilities of the OS and the available hardware), i.e., it could listen on exactly one IP address or on any. Such configuration options are called variable or mutable2.5.
It seems to be useful to make a list of configuration options and their ``configurability'', i.e., whether they are fixed, or at which places they can change, i.e., on which other values they can depend.
As required, the semantics of the configuration file does not depend on its layout, i.e., spaces are only important for delimiting syntactic entities, tabs (whitespace) do not have a special meaning.
The syntax of the sendmail X configuration files is as follows:
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | "{" |
This can be shortened to (remove the rule for entries):
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | ||
| ::= | "{" |
Generic definition of ``list'':
| ::= |
That is, a configuration file consists of a several entries,
each of which is either a section or an option.
A section starts with a keyword, e.g., mailer, daemon, rewriterules,
and has an optional name, e.g., daemon MTA.
Each section contains a section of entries which is embedded in curly braces.
Each syntactic entity that isn't embedded in braces
is terminated with a semicolon.
An entry in a section can be an option or a (sub)configuration.
To make writing configuration files simpler,
lists can have a terminating comma and a semicolon
can follow after
values
.
That makes these symbols terminators not separators.
Examples:
mailer smtp {
Protocol = SMTP;
Connection = TCP;
Port = mtp;
flags { DSN }
MaxRecipientsPerSession = 5;
};
mailer lmtp {
Protocol = LMTP;
flags = { LocalRecipient, Aliases }
Path = "/usr/bin/lmtp";
};
Daemon MTA {
smtps-restriction = { qualified-sender, resolvable-domain }
};
Map Mailertable { type = hash; file = "/etc/smx/mailertable"; };
Rewrite {
Envelope {
sender = { Normalize, Canonify },
recipient = { Normalize, Virtual, Mailertable }
};
Header {
sender = { Normalize },
recipient = { Normalize }
};
};
Check {
DNSBL MyDNSBL { Orbd, Maps }
Envelope {
sender = { Qualified, MyDNSBL },
recipient = { Qualified, AuthorizedRelay }
};
};
The usual rules for identifiers
(list of characters, digits, and underscores) apply.
Values (
name
) that contain spaces must be quoted,
other entries can be quoted, but don't need to.
Those quotes are stripped in the internal representation.
Backslashes can be used to escape meta-symbols.
Todo: completely specify syntax.
Note: it has been proposed to make the equal sign optional for this rule:
| ::= |
However, that causes a reduce/reduce conflict when the grammar is fed into yacc(1)2.6because it conflicts with
| ::= |
That is, with a lookahead of one it can not be decided whether something
reduces to
option
or
section
.
If the parser ``knows'' whether some identifier is a keyword or
the name of an option then the equal sign can easily be optional.
However, doing so violates the layering principle
because it ``pushes'' knowledge about the actual configuration file
into the parser where it does not really belong:
the parser should only know about the grammar.
Of course if would be possible to write a more specific grammar
that includes lists of options and keywords.
However, keeping the grammar abstract (hopefully) allows for simpler tools
to handle configuration files.
Moreover, if new options or keywords are added the parser does not
need to change, it is only the upper layers that perform semantic
analysis of a configuration file.
Most configuration/programming languages provide at least one way to add comments: a special character starts a comment which extends to the end of the line. Some languages also have constructs to end comments at a different place than the end of a line, i.e., they have characters (or character sequences) that start and end a comment. To make it even more complicated, some languages allow for nested comments. Text editors make it fairly easy to replace the begin of a line with a character and hence it is simple to ``comment out'' entire sections of a (configuration) file. Therefore it seems sufficient to have just a simple comment character (``#'') which starts a comment that extends to the end of the current line. The comment character can be escaped, i.e., its special meaning disabled, by putting a backslash in front of it as usual in many languages.
For now all characters are in UTF-8 format which has ASCII has a proper subset. Hence it is possible to specify texts in a different language, which might be useful in some cases, esp. if the configuration syntax is also used in other projects than sendmail X.
Strings are embedded (as usual) in double quotes.
To escape special characters inside strings the usual C conventions
are used, probably enhanced by a way to specify unicode characters
(``
uVALUE'').
Strings can not continue past the end of a line,
to specify longer strings they can be continued by starting
the next line (after any amount of white space) with a double quote
(just like in ANSI C).
The parser should be able to to some basic semantic checks for various types. That is, it can detect whether strings are well formed (see above), and it must understand basic types like boolean, time specification, file names, etc.
There has been a wish to include configuration data via files or even databases, e.g., OpenLDAP attributes.
There are some suggestions for alternative configuration formats:
option = value
This syntax is not flexible enough to describe the configuration of an MTA, unless some hacks are employed as done by postfix which uses an artificial structuring by naming the options ``hierarchically''. For example, sendmail 8 uses a dot-notation to structure some options, e.g., timeouts (Timeout.queuereturn.urgent); postfix uses underscores for a similar purpose, e.g.,
smtpd_recipient_restrictions = smtpd_sender_restrictions = local_destination_concurrency_limit = default_destination_concurrency_limit =
An explicit hierarchical structure is easier to understand and to maintain.
SMTP defines a structure which influences how a SMTP server (and client) can be configured. The topmost element in SMTP is a session, which can contain multiple transactions, which can contain multiple recipients and one messsage. Each of these elements has certain attributes (properties). For example
This structure restricts how a SMTP server can be configured. Some things can only be selected (``configured'') at a certain point in a session, e.g., a milter can not be selected for each recipient2.7, neither can a server IP address selected per transaction, other options have explicit relations to the stage in a session, e.g., MaxRecipientsPerSession, MaxRecipientsPerTransaction (which might be better expressed as Session.MaxRecipients and Transaction.MaxRecipients or Session.Transaction.MaxRecipients). Some options do not have a clear place in a session at all, e.g., QueueLA, RefuseLA: do these apply to a session, a transaction or a recipient? It is possible to use QueueLA per recipient, but only in sendmail X because it does scheduling per recipient, in sendmail 8 scheduling is done per transaction and hence QueueLA can only be per transaction. This example shows that an actual implementation restricts the configurability, not just the protocol itself.
If a SMTP session is depicted as a tree (where the root is a session) then there is a ``maximum depth'' for each option at which it can be applied. As explained before, that depth is determined
Question: taking these restrictions into consideration, can we specify the maximum depth for each configuration option at which the setting the option is possible/makes sense? Moreover, can we specify a range of depths for options? For example: QueueLA can be a global option, an option per daemon, an option per session, etc. If such a range can be defined per option, then the configuration can be checked for violations. Moreover, it restricts the ``search'' for the value of an option that must be applied in the current stage/situation.
Question: it seems the most important restriction is the implementation (beside the structure of SMTP of course). If the implementation does not check for an option at a certain stage, then it does not make any sense to specify the option at that stage. While for some options it is not much effort to check it at a very deep level, for others that means that data structures must be replicated or be made significantly more complex. Examples:
recipient postmaster { reject-client-ip {map really-bad-abusers} }
recipient * { reject-client-ip {map all-abusers} }
This brings us back to the previous question: Question: can we specify the maximum depth for each configuration option at which the setting the option makes sense or at which it is possible without making the implementation too complex.
There are other configuration options which do not really belong to that structure, e.g., ``mailers'' (as they are called in sm8). A mailer defines a delivery agent (DA), it is selected per recipient. Hence a DA describes the behavior of an SMTP client, not an SMTP server. In turn, many options are per DA too, while others only apply to the server, e.g., milters are server side only2.9.
Problem: STARTTLS is a session atttribute, i.e., whether it is used/offered is defined per client/server (per session). However, it is useful (and possible) to require certain STARTTLS features per recipient2.10(as sm8 does via access db and ruleset). It is not possible to say: only offer STARTTLS feature X if the recipient is R, but it is possible to say: if the recipient is R then STARTTLS feature X must be in use (active). Moreover, it's not possible to say: "if the recipient is R, the milter M must be used." How do those configuration options fit into the schema explained above? What's the qualitative difference between these configuration options?
Questions: What's the qualitative difference between these examples? What is the underlying structure? How does the structure define configurability, i.e., what defines why a behavior/option can be dependent on something but not on something else?
For example: STARTTLS in client (SMTPC): this isn't really: ``use STARTTLS with feature X if recipient R will be send'', but it is: ``if recipient R will be send then STARTTLS with feature X must be active'' (similar to SMTPS). However, it is conceivable to actually do the former, i.e., make a session option based on recipient because smX can do per recipient scheduling, i.e., a DA is selected per recipient. Hence it can be specified that a session to deliver recipient R must have STARTTLS feature X. However, doing that makes connection reuse significantly more complicated (see Section 3.4.10.2). Question: doesn't this define a specific DA? Each DA has some features/options. Currently the use of STARTTLS is orthogonal to DAs (e.g., almost completely independent) hence the connection reuse problem (a connection is defined by DA and server, not DA and server and specific features because those features should be in the DA definition). Hence if different DAs are defined based on whether STARTTLS feature X should be used, then we tied a session to DA and server. This brings us to the topic of defining DAs. Question: what do we need ``per DA'' to make things like connection reuse simple? Note: if we define DAs with all features, then we may have a lot of DAs. Hence we should restrict the DA features to those which are really specific to a DA (connection/session/transaction) behavior, and cannot be defined independently. For example, it doesn't seem to be useful to have a DA for each different STARTTLS and AUTH feature, e.g., TLS version, SASL mechanism, cipher algorithm, and key length. However, can't we leave that decision up to the admin?
In addition to simple syntax checks, it would be nice to check a configuration also for consistency. Examples?
As explained in Section 2.1.3.2 there are some issues with the structuring of the configuration options. Here is a simple example that should serve as base for a discussion:
Daemon MTA {
smtps-restriction { qualified-sender, resolvable-domain }
mailer smtp { Protocol SMTP; Port smtp; flags { DSN }
MaxRecipientsPerSession 25;
};
Aliases { type hash; file /etc/smx/aliases; };
mailer lmtp { Protocol LMTP; flags { LocalRecipient, Aliases }
Path "/usr/bin/lmtp";
};
Map Mailertable { type hash; file /etc/smx/mailertable; };
Rewrite {
Envelope { sender { Normalize },
recipient { Normalize, Virtual, Mailertable }
};
Header { sender { Normalize }, recipient { Normalize } };
};
};
Daemon MSA {
mailer smtp { Protocol SMTP; Port submission; flags { DSN }
MaxRecipientsPerSession 100;
};
Aliases { type hash; file /etc/smx/aliases; };
mailer lmtp { Protocol LMTP; flags { LocalRecipient, Aliases }
Path "/usr/bin/lmtp";
};
Rewrite {
Envelope { sender { Normalize, Canonify },
recipient { Normalize, Canonify }
};
Header { sender { Normalize, Canonify },
recipient { Normalize, Canonify } };
};
};
This configuration specifies two daemons: MTA and MSA which share several subconfigurations, e.g., aliases and lmtp mailer, that are identical in both daemons. As explained in Section 2.1.3.2 it is better to not duplicate those specifications in various places. Here is the example again written in the new style:
aliases MyAliases { type hash; file /etc/smx/aliases; };
mailer lmtp { Protocol LMTP; flags { LocalRecipient, Aliases }
Path "/usr/bin/lmtp";
};
Daemon MTA {
smtps-restriction { qualified-sender, resolvable-domain }
mailer smtp { Protocol SMTP; Port smtp; flags { DSN }
MaxRecipientsPerSession 25;
};
aliases MyAliases;
mailer lmtp;
Map Mailertable { type hash; file /etc/smx/mailertable; };
Rewrite {
Envelope { sender { Normalize },
recipient { Normalize, Virtual, Mailertable }
};
Header { sender { Normalize }, recipient { Normalize } };
};
};
Daemon MSA {
mailer smtp { Protocol SMTP; Port submission; flags { DSN }
MaxRecipientsPerSession 100;
};
aliases MyAliases;
mailer lmtp;
Rewrite {
Envelope { sender { Normalize, Canonify },
recipient { Normalize, Canonify }
};
Header { sender { Normalize, Canonify },
recipient { Normalize, Canonify } };
};
};
Here the subconfigurations aliases and lmtp mailer are referenced explicitly from both daemon declarations. This is ok if there are only a few places in which a few common subconfiguration are referenced, but what if there are many subconfigurations or many places? In this case a new root of the tree would be used which declares all ``global'' options which can be overridden in subtrees. So the configuration tree would look like:
generic declarations
common root
daemon
mailer
?
Question: what is the complete structure of the configuration tree? Question: can the tree be specified by the configuration file itself, or is its structure fixed in the binary?
The next problem is how to find the correct value for an option. For example, how to determine the value for MaxRecipientsPerSession in this configuration:
MaxRecipientsPerSession 10;
Daemon MTA {
MaxRecipientsPerSession 25;
mailer smtp { ... }; };
mailer relay { MaxRecipientsPerSession 75; }; };
Daemon MSA {
MaxRecipientsPerSession 50;
mailer smtp { MaxRecipientsPerSession 100; }; };
Does this mean the system has to search in the tree for the correct value? This wouldn't be particularly efficient.
sendmail 8 also offers configuration via the access database, i.e., some tagged key is looked up to find potential changes for the configuration options that are specified in the cf file. For example, srv_features allows to set several options based on the connecting client (see also Section 2.2.6.2). This adds another ``search path'' to find the correct value for a configuration option. In this case there are even two tree structures that need to be searched which are defined by the host name of the client and its IP address, both of which are searched for in the database by removing the most significant parts of it, e.g., Tag:host.sub.tld, Tag:sub.tld, Tag:tld, Tag:.
What about a ``dynamic'' configuration, i.e., something that contains conditions etc? For example:
if client IP = A and LA < B then accept connection else if client IP in net C and LA < D and OpenConnections < E then accept connection else if OpenConnections < F then accept connection else if ConnectionRate < G then accept connection else reject connection fi
Note: it might be not too hard to specify a functional configuration language, i.e., one without side effects. However, experience with sm8 shows that temporary storage is required too2.13. As soon as assignments are introduced, the language becomes significantly more complex to implement. Moreover, having such a language introduces another barrier to the configuration: unless it is one that is established and widely used, people would have to learn it to use smX efficiently. For example, the configuration language of exim allows for runtime evaluation of macros (variables) and the syntax is hard to read (as usual for unknown languages). There are a few approaches to deal with this problem:
One proposal for the smX syntax includes conditionals in the form of
In sendmail 8 it proved to be useful to have some configuration options stored in maps. These can be as simple as reply codes to certain phases in ESMTP and for anti-relay/anti-spam checks, and as complex as the srv_features rulesets (see also Section 2.2.5).
There are several reasons to have configuration options in maps:
If not just anti-spam data is stored in maps but also more complicated options (as explained before: map entries for srv_features) then those options are usually not well structured, e.g., for the example it is just a sequence of single characters where the case (upper/lower) determines whether some features is offered/required. This does not fulfill the readability requirements of a configuration syntax for smX.
Question: how to integrate references to maps that provide configuration data into the configuration file syntax and how should map entries look like? One possibility way is to have a set of option combined into a group and reference that group from the map. For example, instead of using
SrvFeatures:10 l Vit would be
LocalSrvFeatures { RequestClientCertificate=No; AUTH=require; };
SrvFeatures:10 LocalSrvFeatures
The defaults of the configuration should be compiled into the binary instead of having a required configuration file which contains all default values.
Advantages:
Disadvantages:
It must be possible to query the various smX components to print their current configuration settings as well as their current status. The output should be formatted such that it can be used as a configuration file to reproduce the current configuration.
It must be possible to tell the various smX components to change their current configuration settings. This may not be practical for all possible options, but at least most of them should be changeable while the system is running. That minimizes downtime to make configuration changes, i.e., it must not be required to restart the system just to change some ``minor'' options. However, options like the size of various data structures may not be changeable ``on the fly''.
sendmail X.0.0.PreaAlpha9 has the following configuration parameters:
Various other definitions: postmaster address for double bounces2.14, log level and debug level could be more specific, i.e., per module in the code, but probably not per something external, configuration flags, time to wait for SMAR, SMTPC to be ready.
It doesn't seem to be very useful to make these dependent on something: minimum and ``ok'' free disk space (KB).
definitions (see Section 2.2.1, 2): log level and debug level (see above), heap check level, group id (numeric) for CDB, time to wait for QMGR to be ready.
run in interactive mode, serialize all accept() calls, perform one SMTP session over stdin/stdout,
socket over which to receive listen fd, specify thread limits per listening address,
create specified number of processes, bind to specified address - multiple addresses are permitted, maximum length of pending connections queue,
I/O timeout: could be per daemon and client IP address,
client IP addresses from which relaying is allowed, recipient addresses to which relaying is allowed.
All of these are definitions:
log level and debug level (see above), heap check level, time to wait for QMGR to be ready, run in interactive mode, create specified number of processes, specify thread limits.
These could be dependent on DA or even server address: socket location for LMTP, I/O timeout, connect to (server)port.
All of these are runtime options, i.e., they are specified when the binary is started (hence definitions in the sense of Section 2.2.1, 2):
log level and debug level (see above), IPv4 address for name server, DNS query timeout, use TCP for DNS queries instead of UDP, use connect(2) for UDP.
All of these are definitions: name: string (name of program/service); port: number or service entry (optional); socket.name: name of socket to listen on: path (optional); tcp: currently always tcp (could be udp); type: type of operation: nostartaccept, pass, wait; exchange_socket: socket over which fd should be passed to program; processes_min: minimum number of processes; processes_max: maximum number of processes; user: run as which user (user name, i.e., string); path: path to executable; args: arguments for execv(3) call.
MCP {
processes_min=1; processes_max=1; type=wait;
smtps {
port=25;
type=pass;
exchange_socket=smtps/smtpsfd;
user=smxs;
path="../smtps/smtps";
arguments="smtps -w 4 -d 4 -v 12 -g 262 -i -l . -L smtps/smtpsfd"; }
smtpc {
user=smxc;
path="../smtpc/smtpc";
arguments="smtpc -w 4 -P 25 -d 4 -v 12 -i -l ."; }
qmgr {
user=smxq;
path="../qmgr/qmgr";
arguments="qmgr -w 4 -W 4 -B 256 -A 512 -d 5 -v 12"; }
smar {
user=smxm;
path="../smar/smar";
arguments="smar -i 127.0.0.1 -d 3 -v 12"; }
lmtp {
socket_name="lmtpsock";
socket_perm="007";
socket_owner="root:smxc";
type=nostartaccept;
processes_min=0;
processes_max=8;
user=root;
path="/usr/local/bin/procmail";
arguments="procmail -z"; }
};
Note: some definitions could be functions (see Section 2.2.1), e.g., I/O timeout could be dependent on the IP address of the other side or the protocol, debug and log level could have similar dependencies. As explained in Section 2.2.4 the implementation restricts how ``flexible'' those values are.
Currently hostname is determined by the binary at runtime. If it is set by the configuration then it could be: global, per SMTP server, per IP address of client, per SMTP client, per IP address of server. This is one example of how an options can be set at various depths in the configuration file. Would this be a working configuration file?
Global { hostname = my.host.tld; }
Daemon SMTPS1 { Port=MTA; hostname=my2.host.tld;
IP-Client { IP-ClientAddr=127.*; hostname=local.host.tld;} }
DA SMTPC1 { hostname=out1.host.tld;
IP-Server { IP-ServerAddr=127.*; hostname=local.host.tld;}
IP-Server { IP-ServerAddr=10.*; hostname=net.host.tld;} }
The lines that list an IP address are intended to act as restrictions, i.e., if the IP address is as follows then apply this setting. Question: Is this the correct way to express that? What about more complicated expressions (see Section 2.2.6)?
In principle these are conditionals:
hostname = my.host.tld;
if (Port==MTA) { hostname=my2.host.tld;
if (IP-ClientAddr==127.*) hostname=local.host.tld; }
Question: what are the requirements for anti-spam configuration for a (pre-)alpha version of sendmail X?
Not yet available: allow relaying based on TLS.
This brings in all the subtleties from sm8, especially delay-checks. What's a simple way to express this?
The Control flow in sm8 is explained in Section 3.5.2.6.
Note: for the first version it seems to the best to use a simple configuration file without any conditionals etc. If an option is dependent on some data, then the access method from sm8 should be used. This allows us to put that data into a well known place and treat it in a matter that has been successfully used before. Configuration like anti-relaying should be ``hard-wired'' in the binary and their behavior should only be dependent on data in a map. This is similar to the mc configuration ``level'' in sm8; more control over the behavior is archievable in sm8 by writing rules which in smX may have some equivalent in modules.
The configuration files must be protected from tampering. They should be owned by root or a trusted user. sendmail must not read/use configuration files from untrusted sources, which not just means wrong owners, but also files in unsecure directories.
Some processes require root privileges to perform certain operations. Since sendmail X will not have any set-user-id root program for security reasons, those processes must be started by root. It is the task of the supervisor process (MCP: Master Control Process) to do this.
There are a few operations that usually require root privileges in a mail system:
The MCP will bind to port 25 and other ports if necessary before it starts the SMTP server daemons (see Section 2.5) such that those processes can use the sockets without requiring root access themselves.
The supervisor process will also be responsible for starting the various processes belonging to sendmail X and supervising them, i.e., deal with failures, e.g., crashes, by either restarting the failed processes or just reporting those crashes if they are fatal. The latter may happen if a system has a hardware or software failure or is (very) misconfigured. The MCP is also responsible for shutting down the entire sendmail X system on request.
The configuration file for the supervisor specifies which processes to start under which user/group IDs. It also controls the behavior in case of problems, i.e., whether they should be restarted, etc. This is fairly similar to inetd, except that the processes are not started on demand (incoming connection) but at startup and whenever a restart is necessary.
The supervisor process runs as root and hence must be carefully written (just like any other sendmail X program). Input from other sources must be carefully examined for possible security implications (configuration file, communication with other parts of the sendmail X system).
The queue manager is the central coordination instance in sendmail X. It controls the flow of e-mail throughout the system. It implements (almost) all policies and coordinates the receiving and sending processes. Since it controls several processes it is important that it will not slow them down. Hence the queue manager will be a multi-threaded program to allow for easy scalability and fast response to requests.
The queue manager will handle several queues (see Section 2.4.1); there will be at least queues for incoming mails, for scheduling delivery of mails, and for storing information about delayed mails.
The queue manager also maintains the state of the various other processes with which it communicates and which it controls, e.g., the open connections of the delivery agents. It should also have knowledge about the system to which it sends e-mails, i.e., whether they are accepting e-mails, probably the throughput of the connections, etc.
Todo: add a complete specification what the QMGR does; at least the parts that aren't related to incoming SMTP.
One proposal for a set of possible queues is:
Having several on-disk queues has the following advantages:
Disadvantages of several on-disk queues are:
Since the disadvantages outweigh the advantages the number of on-disk queues will be minimized. The deferred queue will become the main queue and contains also entries that are on hold or waiting for ETRN. Envelopes for bounces should go to the main queue too. This way the status for an envelope is available in one place (well, almost: the incoming queue may have the current status). Only the ``corrupt'' queue is different since nobody ever schedules entries from this queue and it probably needs a different format (no decision yet). To achieve the ``effect'' of having different queues it might be sufficient to build different indices to access parts of the queue (logical queues). For example, the ETRN queue index has only references to items in the queue that are waiting for an ETRN command.
The ``active'' and the ``incoming'' queues are resident in memory for fast access. The incoming queue is backed up on stable storage in a form that allows fast modifications, but may be slow to reconstruct in case the queue manager crashes. The active queue is not directly backed up on disk, other queues act as backup. That is, the active queue is a restricted size cache (RSC) of entries in other queues. The deferred queue contains items that have been removed from the active queue for various reasons, e.g., policy (deliver only on ETRN, certain times, quarantine due to milter feedback, etc), delays (temporary failures, load too high, etc), or as the result of delivery attempts (success/failure/delay). The active queue must contain the necessary information to schedule deliveries in efficient (and policy based) ways. This implies that there is not just one way to access the data (one key), but several to accommodate different needs. For example, it might be useful to MX-piggyback deliveries, which requires to store (valid, i.e., not-expired) MX records together with recipient domains. Another example is a list of recipients that wait for a host to be able to receive mail again, i.e., a DB which is keyed on hosts (IP addresses or names?) and the data is a list of recipients for those hosts.
Normally entries in the incoming queue are moved into the deferred queue only after a delivery attempt, i.e., via the active queue. However, the active queue itself is not backed up on persistent storage. Hence an envelope must be either in the incoming queue or in the deferred queue at any given time (unless it has been completely delivered). Moving an entry into the deferred queue must be done safely, i.e., the envelope must be safely in the deferred queue before it is removed from the incoming queue. Question: When do we move the sender data over from the incoming to the deferred queue? Do we do it as soon as one recipient has been tried or only after all have been tried? Since we are supposed to try delivery as soon as we have the mail, we probably should move the sender data after we tried all recipients. ``Trying'' means here: figure out a DA, check flags/status (whether to try delivery at all, what's the status of the system to which the mail should be delivered), if normal delivery: schedule it, otherwise move to the deferred queue.
The in-memory queues are limited in size. These sizes are specified in the configuration file. It is not yet clear which form these specifications may have: amount of memory, amount of entries, percentage of total memory that can be used by sendmail X or specific processes. The specification should include percentages at which the behavior of the queue manager changes, esp. for the incoming queue. If the incoming queue becomes almost full the acceptance of messages must be throttled. This can be done in several stages: just slow down, reduce the number of concurrent incoming connections (just accept them slower), and in the extreme the SMTP server daemons will reject connections. Similarly the mail acceptance must be slowed down if the active queue is about to overflow. Even though the queue manager will normally favor incoming mail over other (e.g., deferred) mail, it must not be possible to starve those other queues. The size of the active queue does not need to be a direct feedback mechanism to the SMTP daemon, it is sufficient if this is happening indirectly through the incoming queue (which will fill up if items can't be moved fast enough into the active queue). However, this may not be intended, maybe we want to accept messages for some limited time faster than we can send them.
It might be nice for the in-memory queues to vary in size during runtime. In high-load situation those queues may grow up to some maximum, but during lower utilization they should shrink again. Maximum and minimum sizes should be user-configurable. However, in general the OS (VM system) should solve the problem for us.
Here's the list of queues that are currently used:
One proposal for a set of possible queues is:
There's currently no decision about the queue for corrupted entries.
The incoming queue must be backed up on stable storage to ensure reliability. The most likely implementation right now is a logfile, in which entries are simply appended since this is the fastest method to store data on disk. This is similar to a log-structured filesystem and we should take a look at the appropriate code. However, our requirements are simpler, we don't need a full filesystem, but only one file type with limited access methods. There must be a cleanup task that removes entries from the backup of the incoming queue when they have been taken care of. For maximum I/O throughput, it might be useful to specify several logfiles on different disks.
The other queues require different formats. The items on hold are only released on request (e.g., for ETRN). Hence they must be organized in a way that allows easy access per domain (the ETRN argument) or other criteria, e.g., a hold message for quarantined entries.
The delayed queue contains items that could not be delivered before due to temporary errors. These are accessed in at least two ways:
The queue manager needs to keep a reference count for an envelope to decide when an entry can be removed. This may become non-trivial in case of aliases (expansion). If the LDA does alias expansion then one question is whether it injects a new mail (envelope) with a new body. Otherwise reference counting must take care of this too.
The MAIL (sender) data, which includes the reference counter, is stored in the deferred queue if necessary, i.e., as long as there are any recipients left (reference count greater than zero). Hence we must have a fast way to access the sender data by transaction id. At any time the envelope sender information must be either in the incoming queue or in the deferred queue.
Problem: mailing lists require to create new envelope sender addresses, i.e., the list owner will be the sender for those mails. An e-mail can be addressed to several mailing lists and to ``normal'' recipients, hence this may require to generate several different mail sender entries. Question: should the reference counter only be in the original sender entry and the newly created entries have references to that? Distributing the reference count is complicated. However, this may mean that a mail sender entry stays around even though all of its (primary) recipients have been taken care of.
It might be necessary to have a garbage collection task: if the system crashes during an update of a reference counter the value might become incorrect. We must assure that the value is never too small, because we could remove data that is still needed. If the stored value is bigger than it should be, the garbage collection task must deal with this (compare fsck for file systems).
Envelopes received by the SMTP servers are put into the incoming queue and backed up on disk. If an envelope has been completely received, the data is copied into the active queue unless that queue is full2.16. Entries in the active queue are scheduled for delivery. If delivery attempts are done, the results of those attempts are written to the incoming queue (mark it as delivered) or deferred queue as necessary. Entries from the deferred queue are copied into the active queue based on their ``next time to try'' time stamp.
It would be nice to define some terms for
A delivery attempt consists of:
This section explains how data is added to the various queues, what happens with it, and under which circumstances data is read from a queue if there is no queue into which the data is read, i.e., this is a consumer oriented view.
This section gives a bit more details about the data flow than the previous section. It does only deal with data that is stored by QMGR in some queue, it does not specify the complete data flow, i.e., what happens in the SMTP server or the delivery agents.
Question: should we keep entries in the incoming queue only during delivery attempts, or should we move the envelope data into the deferred queue while the attempts are going on? If we move the envelopes, we have more space available in the incoming queue and can accept more mail. However, moving envelopes costs of course performance. In the ``normal'' case we don't need the envelope data in the deferred queue, i.e., if delivery succeeds for all recipients and no SUCCESS DSNs are requested, we don't need the envelope data ever in the deferred queue. Question: do we want a flexible algorithm that moves the envelope data only under certain conditions? Those conditions could include how much space is free in the incoming queue and how long an entry is already in the queue. There should be two different modes (selectable by an option):
We need the envelope data in the deferred queue, if and only if
If no DSN must be sent and all recipients have been taken care of, the envelope does not need to be moved into the DEFEDB, and it can be removed from the INCEDB afterwards without causing additional data moving.
Note: it should be possible to remove recipient and transaction data from IQDB as soon as it has been transferred to AQ and safely committed to IBDB; at this moment the data is in persistent storage and it is available to the scheduler, hence the data is not really needed anymore in IQDB. There are some implementations issues around this2.19, hence it is not done in the current version, this is something that should be optimized in a subsequent version.
When a delivery attempt (see 4d in Section 2.4.3.2) has been made, the recipient must be taken care of in the appropriate way. Note that a delivery attempt may fail in different stages (see Section 2.4.3.1), and hence updating the status of a recipient can be done from different parts of QMGR. That is, in all cases the recipient address is removed from ACTEDB and
The data is stored in DEFEDB (persistent storage) to avoid retrying a failed delivery, see also Section 2.4.6.
Notice: it is recommended to perform the update for a delivery attempt in one (DB) transaction to minimize the amount of I/O and to maintain consistency. Furthermore, the updates to DEFEDB should be made before updates to INCEDB are made as explained in Section 2.4.1.
Note: this section does not discuss how to deal with a transaction whose recipients are spread out over INCEDB and DEFEDB. For example, consider a transaction with two recipients, all data is in INCEDB. A delivery attempt for one recipient causes a temporary failure, the other recipient is not tried yet. Now the transaction and the failed recipient are written to DEFEDB. However, the recipient counters in the transaction do not properly reflect the number of recipients in DEFEDB but in both queues together. The recovery program must be able to deal with that.
According to item 10 in Section 2.4.3.3 entries are read from the deferred queue into the active queue based on their ``next time to try'' (or whatever criteria the scheduler wants to apply). Instead of reading through the entire DB -- which is on disk and hence expensive disk I/O is involved -- each time entries should be added, an in-memory cache (EDBC) is maintained which contains references to entries in DEFEDB sorted based on the ``next time to try''. Note: it might be interesting to investigate whether an DEFEDB implementation based on Berkeley DB would make this optimization superfluous because Berkeley DB maintains a cache anyway. However, it is not clear which data the cache contains, most likely it is not ``next time to try'' but only the key (recipient/transaction identifiers).
Even though each entry in the cache is fairly small (recipient identifier, next time to try, and some management overhead), it might be impossible to hold all references in memory because of the size. Here is a simple size estimation: an entry is about 40 bytes, hence 1 million entries require 40 MB. If a machine actually has 1 million entries in its deferred queue then it has most likely more than 1 GB RAM. Hence it seems fairly unlikely to exceed the available memory with EDBC. Nevertheless, the system must be prepared to deal with such a resource shortage. This can be done by changing into a different mode in which DEFEDB is regularly scanned and entries are inserted to EDBC such that older entries will be removed if newer entries are inserted and EDBC is full2.20.
If the MTS is busy it might not be possible to read all entries from DEFEDB when their next time to try is actually reached because AQ might be full. Hence it is necessary to establish some fairness between the two producers for AQ: IQDB (SMTP servers) and DEFEDB. A very simple approach is to reserve a certain amount, e.g., half, for each of the producers. However, that does not seem to be useful:
A slightly better approach is as follows:
This approach will
sendmail 8 provides a delivery mode called interactive in which a mail is delivered before the server acknowledges the final dot. An enhanced version of this could be implemented in sendmail X, i.e., try immediate delivery but enforce a timeout after which the final dot is acknowledged. A timeout is necessary because otherwise clients run into a timeout themselves and resend the mail which will usually result in double deliveries.
This mode is useful to avoid expensive disk I/O operations; in a simple mode at least the fsync(2) call can be avoided, in a more complicated mode the message body could be shared directly between SMTP server and delivery agent to even avoid creation of file on disk (this could be accomplished by using the buffered file mode from sendmail 8 with large buffers, however, this requires some form of memory sharing2.21). Various factors can be used to decide whether to use interactive delivery, e.g., the size of the mails, the number of recipients and their destinations, e.g., local versus remote, or other information that the scheduler has about the recipient hosts, e.g., whether they are currently unavailable etc.
Cut-through delivery requires a more complicated protocol between QMGR and SMTP server. In normal mode the SMTP server calls fsync(2) before giving the information about the mail to QMGR and then waits for a reply which in turn is used to inform the SMTP client about the status of the mail, i.e., the reply to the final dot. For cut-through delivery the SMTP server does not call fsync(2) but informs QMGR about the mail. Then the following cases can happen:
For case 1b the SMTP server needs to send another message to QMGR telling it the result of fsync(2). If fsync(2) fails, the message must be rejected with a temporary error, however, QMGR may already have delivered the mail to some recipients, hence causing double deliveries.
Items from the delayed queue need to be read into the active queue based on different criteria, e.g., time in queue, time since last attempt, precedence, random walk.
The queue manager must establish a fair selection of items in the incoming queue and items in the delayed queue. This algorithm can be influenced by user settings, which includes simple options (compare QueueSortOrder in sendmail 8), table driven decisions, e.g., no more than N connections to a given host, and a priority (see Section 2.4.4.1). A simple way to achieve a fair selection is to establish a ratio (that can be configured) between the queues from which entries are read into the active queue, e.g., incoming: 5, deferred: 1. Question: do we use a fixed ratio between incoming and deferred queue or do we vary that ratio according to certain (yet to determine) conditions? These ratios are only honored if the system is under heavy load, otherwise it will try to get as many entries into the active queue as possible (to keep the delivery agents busy). However, the scheduler will usually not read entries from the deferred queue whose next time to try isn't yet reached, unless there is a specific reason to do so. Such a reason might be that a connection to the destination site became available, an ETRN command has been given, or deliver is forced by an admin via a control command. Question: does the ratio refer to the number of recipients or the number of envelopes?
The QMGR must at least ensure that mail from one envelope to the same destination site is send in one transaction (unless the number of recipients per message is exceeded). Hence there should be a simple way to access the recipients of one envelope, maybe the envelope id is a key for the access to the main queue. See also 2.4.4.4 for further discussion. Additionally MX piggybacking (as in 8.12) should be implemented to minimize the required number of transactions.
Question: how to schedule deliveries, how to manage the active queue? Scheduling: Deliveries are scheduled only from the active queue, entries are added to this queue from the incoming queue and from the deferred queue.
To reduce disk I/O the active queue has two thresholds: the maximum size and a low watermark. Only if too few entries are in the cache entries are read from the deferred queue. Problem: entries from the incoming queue should be moved as fast as possible into the active queue. To avoid starvation of deferred entries a fair selection must be made, but this must be done on a ``large'' scale to minimize disk I/O. That is, if the ratio is 2-1 (at least one entry from the deferred queue for every two from the incoming queue), then it could be that 100 entries are moved from the incoming queue, and then 50 from the deferred queue. Of course the algorithm must be able to deal with border conditions, e.g., very few incoming entries but large deferred queue, or only a few entries trickling in such that the number of entries in the active queue is always in the range of the low watermark.
Question: where/when do we ask the address resolver for the delivery tuple? That's probably a configuration option. The incoming queue must be able to store addresses in external and in ``resolved'' form. See also Section 3.12.6 for possible problems when using the resolved form.
Here's a list of scheduling options people (may) want (there are certainly many more):
Question: how to specify such scheduling options and how to do that in an efficient way? It doesn't make much sense to evaluate a complicated expression each time the QMGR looks for an item in the deferred queue to schedule for delivery. For example, if an entry should only be sent at certain times, then this should be ``immediately'' recognizable (and the item can be skipped most of the time, similar to entries on hold).
Remark: qmail-1.03/THOUGHTS [Ber98] contains this paragraph:
Mathematical amusement: The optimal retry schedule is essentially, though not exactly, independent of the actual distribution of message delay times. What really matters is how much cost you assign to retries and to particular increases in latency. qmail's current quadratic retry schedule says that an hour-long delay in a day-old message is worth the same as a ten-minute delay in an hour-old message; this doesn't seem so unreasonable.
Remark: Exim [Haz01] seems to offer a quite flexible retry time calculcation:
For example, it is possible to specify a rule such as `retry every 15 minutes for 2 hours; then increase the interval between retries by a factor of 1.5 each time until 8 hours have passed; then retry every 8 hours until 4 days have passed; then give up'. The times are measured from when the address first failed, so, for example, if a host has been down for two days, new messages will immediately go on to the 8-hour retry schedule.
Courier-MTA has four variables to specify retries:
,
,
,
![]()
These control files specify the schedule with which Courier tries to deliver each message that has a temporary, transient, delivery failure.and
contain a time interval, specified in the same way as queuetime.
and
contain small integral numbers only.
Courier will first makedelivery attempts, waiting for the time interval specified by
between each attempt. Then, Courier waits for the amount of time specified by
, then Courier will make another
delivery attempts,
amount of time apart. If still undeliverable, Courier waits
amount of time before another
delivery attempts, with
amount of time apart. The next delay will be
amount of time long, the next one
, and so on.
sets the upper limit on the exponential backoff. Eventually Courier will keep waiting
amount of time before making
delivery attempts
amount of time apart, until the queuetime interval expires.
The default values are:
This results in Courier delivering each message according to the following schedule, in minutes: 5, 5, 5, 15, 5, 5, 30, 5, 5, 60, 5, 5, then repeating 120, 5, 5, until the message expires.
There are two levels of scheduling:
We could assign each entry a priority that is dynamically computed. For example, the priority could incorporate:
However, it is questionable whether we can devise a formula
that generates the right priority.
How do we have to weight those parameters (linear functions?),
and how to combine them (
)?
It might be simpler (better) to specify the priority in some
logical formula (if-then-else) in combination with arithmetic.
Of course we could use just arithmetic (really?) if we use the
right operations.
However, we want to be able to short-cut the computation,
e.g., if one parameter specifies that the entry certainly will
not be scheduled now.
For example:
if time-next-try
now then Not-Now unless
connections-open(recipient-site)
.
On system with low mail volume the schedulers will not be busy all the time. Hence they should sleep for a certain time (in sendmail 8 that's the -q parameter). However, it must be possible to wake them up whenever necessary. For example, when a new mail comes in the first level scheduler should be notified of that event such that is can immediately put that mail into the active queue if that is possible, i.e., there is enough free space. The sleep time might be a configurable option, but it should also be possible to just say: wake up at the next retry time, which is the minimum of the retry times in the deferred queue.
The next retry time should not be computed based on the message/recipient, but on the destination site (Exim does that). It doesn't make much sense to contact a site that is down at random intervals because different messages are addressed to it. Since the status of a destination site is stored in the connection cache, we can use that to determine the next retry time. However, we have the usual problem here: a recipient specifies an e-mail address, not the actual host to contact. The latter is determined by the address resolver, and in general, it's not a single host, but a list of hosts. In theory, we could base the retry time on the first host in the list. However, what should we do if another host in the list has a different next retry time, esp. an earlier one? Should we use the minimum of all retry times? We would still have to try the hosts in order (as required by the standard), but since a lower priority host may be reachable, we can deliver the mail to it. Question: under which circumstance can a host in the list have an earlier retry time? This can only happen if the list changes and a new host is added to it (because of DNS changes or routing changes). In that case, we could set the retry time for the new host to the same time as all the other hosts in the list. However, this isn't ``fair'', it would penalize all mails to that host. So maybe it is best to use the retry time of the first host in the list as the retry time of a message.
Note: There are benefits to some randomness in the scheduling. For example, if some systematic problem knocks down a site every 3 hours, taking 15 minutes to restore itself, then delivery attempts should not accidentally synchronize with the periodic failures. Hence adding some ``fuzz'' factor might be useful.
Notice: it might be useful to have a pre-emptive scheduler. That is, even if the act