TUTORIAL FOR EMIL VERSION 2.1
Written by Martin Wendel, ITS, Uppsala university.
HOW EMIL V2 DOES MESSAGE CONVERSION.
To be able to understand message conversion it is important to understand
the components of a message and the types of these components. Basically
a message consists of a header and a body. The format of the header is
thoroughly defined in rfc822 and rfc1522. Basically it consists of lines
of text divided into two parts; the field and the value, separated by
a colon ':'. Apart from a few details the header is very straight forward.
The body, on the other hand, is somewhat more tricky to understand.
The body of a message consists of text or/and encoded data, where data
may refer to text as well. Emil recognizes the encodings BinHex, Base64,
Quoted-Printable and UUencode. Text is the parts of the message that is
not recognized as being an encoded enclosure. Text is a raw format whereas
encoded data follows the syntax of the encoding. This means that incorrect
or incomplete encodings are not recognized and are thus treated as raw text
(as a safety measure).
Emil works in a two pass manner. After loading the message, it first
interprets the header and body of the message before applying any
conversions. First the format is recognized. This is done by interprating
the header of the message. Then the data of the message is structured into
a hierarchical message structure.
UUencoded and BinHexed enclosures are recognized by their preamble and a
syntax check is made on them to make pretty sure they are valid. When emil
receives a MIME or Mailtool message, encoded enclosures are defined in the
header. Emil recognizes these definitions and trusts them to be valid, no
further processing is done initially. The rest of the body parts and the
erroneous BinHex and UUencode parts are treated as text. The text is checked
for 7bit or 8bit encoding.
Emil now has a hierarchical representation of the message and each part
is type marked by it's encoding. It may be that the trusted encodings of a MIME
or Mailtool message are erroneous as well. However, the definitions in the
header is a strong evidence that they must be correct. In the case of error,
in spite of all that,
the message part is left untouched, but not treated as text.
Conversion is applied as specified by the target format. Emil has a
clear view of the incoming message and can work pretty straight forward
parsing the hierarchical structure. Each object in the structure contains
headers, data, and pointers to other objects. When applying conversion, first
the data of each object is converted to what's specified in the target format,
then the headers of each object is taken care of based on the resulting type
and encoding of the data of the objects.
An example: The target format specifies Sun Mailtool, UUencoded attachments
and ISO-646-SE text. A MIME message containing a Quoted-Printable encoded
text/plain using ISO-8859-1 and a Base64 encoded Image/GIF will be converted
as follows (this is a fairly long and detailed description):
- The entire message is loaded into memory.
- The header of the message is examined and loaded into a structure.
- This yields format MIME and type Multipart/Mixed. The boundary
- The start of the body of the message is marked up at the end of
- Find the end boundary, the end of the message is marked just
before the end boundary. The root object in the message structure
- Start off a child object.
- We've got boundaries, go find the first boundary.
- Examine the header of the first body part.
- This yields type=Text/plain, charset=ISO-8859-1 and
- Find the second boundary, terminating the first body part and
initiating the second. The first child object is completed.
- Start off a sister object to the previous child object.
- Examine the header of the second body part.
- This yields type=Image/GIF, encoding=Base64.
- There are no more boundaries (the end boundary was detected
in  and end of data as seen by the second bodypart is marked
just before the end boundary), terminating the second bodypart.
- End of data is reached, message parsing is completed. We've got
a hierarchical message structure describing the incoming message.
- Apply conversion.
- Parse the message structure, converting the data.
- First object (root object) is multipart, no data to be converted.
- Second object (first child) is a text with charset=ISO-8859-1
and encoding quoted-printable.
- Target format does not want quoted-printable and does not want
charset=ISO-8859-1. Thus, decode quoted-printable.
- This yields an 8bit text with charset=ISO-8859-1.
- Target format does not want charset=ISO-8859-1. Convert charset
- This yields a 7bit text with charset=ISO-646-SE.
- Target format wants charset=ISO-646-SE, conversion on this data
- Third object (second child) is a GIF encoded in Base64.
- Target format does not want Base64, decode Base64.
- This yields a GIF with encoding=binary.
- Target format does not want binary, encode UUencode.
- This yields a GIF with encoding=UUencode.
- Target format wants UUencode, conversion on this data is completed.
- All data is now converted according to the target format. Start
- All the MIME-specific headers in the root header are marked as such
in the first parse of the header. This includes
MIME-Version and Content-Type. These will not be part of the output.
- The root object is a multipart, add header Content-Type
X-Sun-Encoding. Also add the Mailtool boundary to the boundary
- The second bodypart is a text. Add the Mailtool headers. Also
add header Content-Lines: number of lines.
- The second bodypart is a GIF. Add the Mailtool headers. Also add
header Content-Lines: number of lines.
- The message conversion is completed. Output the message.
- Print the root header.
- Print the Mailtool boundary.
- Print the header of the first child.
- Print the body of the first child.
- Print the Mailtool boundary.
- Print the header of the second child.
- Print the body of the second child.
ITS Uppsala university
751 08 Uppsala