Written by Martin Wendel, ITS, Uppsala university. Martin.Wendel@its.uu.se


To be able to understand message conversion it is important to understand the components of a message and the types of these components. Basically a message consists of a header and a body. The format of the header is thoroughly defined in rfc822 and rfc1522. Basically it consists of lines of text divided into two parts; the field and the value, separated by a colon ':'. Apart from a few details the header is very straight forward. The body, on the other hand, is somewhat more tricky to understand.

The body of a message consists of text or/and encoded data, where data may refer to text as well. Emil recognizes the encodings BinHex, Base64, Quoted-Printable and UUencode. Text is the parts of the message that is not recognized as being an encoded enclosure. Text is a raw format whereas encoded data follows the syntax of the encoding. This means that incorrect or incomplete encodings are not recognized and are thus treated as raw text (as a safety measure).

Emil works in a two pass manner. After loading the message, it first interprets the header and body of the message before applying any conversions. First the format is recognized. This is done by interprating the header of the message. Then the data of the message is structured into a hierarchical message structure.

UUencoded and BinHexed enclosures are recognized by their preamble and a syntax check is made on them to make pretty sure they are valid. When emil receives a MIME or Mailtool message, encoded enclosures are defined in the header. Emil recognizes these definitions and trusts them to be valid, no further processing is done initially. The rest of the body parts and the erroneous BinHex and UUencode parts are treated as text. The text is checked for 7bit or 8bit encoding.

Emil now has a hierarchical representation of the message and each part is type marked by it's encoding. It may be that the trusted encodings of a MIME or Mailtool message are erroneous as well. However, the definitions in the header is a strong evidence that they must be correct. In the case of error, in spite of all that, the message part is left untouched, but not treated as text.

Conversion is applied as specified by the target format. Emil has a clear view of the incoming message and can work pretty straight forward parsing the hierarchical structure. Each object in the structure contains headers, data, and pointers to other objects. When applying conversion, first the data of each object is converted to what's specified in the target format, then the headers of each object is taken care of based on the resulting type and encoding of the data of the objects.

An example: The target format specifies Sun Mailtool, UUencoded attachments and ISO-646-SE text. A MIME message containing a Quoted-Printable encoded text/plain using ISO-8859-1 and a Base64 encoded Image/GIF will be converted as follows (this is a fairly long and detailed description):

  1. The entire message is loaded into memory.
  2. The header of the message is examined and loaded into a structure.
  3. This yields format MIME and type Multipart/Mixed. The boundary is saved.
  4. The start of the body of the message is marked up at the end of the header.
  5. Find the end boundary, the end of the message is marked just before the end boundary. The root object in the message structure is completed.
  6. Start off a child object.
  7. We've got boundaries, go find the first boundary.
  8. Examine the header of the first body part.
  9. This yields type=Text/plain, charset=ISO-8859-1 and encoding=Quoted-Printable.
  10. Find the second boundary, terminating the first body part and initiating the second. The first child object is completed.
  11. Start off a sister object to the previous child object.
  12. Examine the header of the second body part.
  13. This yields type=Image/GIF, encoding=Base64.
  14. There are no more boundaries (the end boundary was detected in [3] and end of data as seen by the second bodypart is marked just before the end boundary), terminating the second bodypart.
  15. End of data is reached, message parsing is completed. We've got a hierarchical message structure describing the incoming message.
  16. Apply conversion.
  17. Parse the message structure, converting the data.
  18. First object (root object) is multipart, no data to be converted.
  19. Second object (first child) is a text with charset=ISO-8859-1 and encoding quoted-printable.
  20. Target format does not want quoted-printable and does not want charset=ISO-8859-1. Thus, decode quoted-printable.
  21. This yields an 8bit text with charset=ISO-8859-1.
  22. Target format does not want charset=ISO-8859-1. Convert charset to ISO-646-SE.
  23. This yields a 7bit text with charset=ISO-646-SE.
  24. Target format wants charset=ISO-646-SE, conversion on this data is completed.
  25. Third object (second child) is a GIF encoded in Base64.
  26. Target format does not want Base64, decode Base64.
  27. This yields a GIF with encoding=binary.
  28. Target format does not want binary, encode UUencode.
  29. This yields a GIF with encoding=UUencode.
  30. Target format wants UUencode, conversion on this data is completed.
  31. All data is now converted according to the target format. Start converting headers.
  32. All the MIME-specific headers in the root header are marked as such in the first parse of the header. This includes MIME-Version and Content-Type. These will not be part of the output.
  33. The root object is a multipart, add header Content-Type X-Sun-Encoding. Also add the Mailtool boundary to the boundary string.
  34. The second bodypart is a text. Add the Mailtool headers. Also add header Content-Lines: number of lines.
  35. The second bodypart is a GIF. Add the Mailtool headers. Also add header Content-Lines: number of lines.
  36. The message conversion is completed. Output the message.
  37. Print the root header.
  38. Print the Mailtool boundary.
  39. Print the header of the first child.
  40. Print the body of the first child.
  41. Print the Mailtool boundary.
  42. Print the header of the second child.
  43. Print the body of the second child.

March 1996

ITS Uppsala university
Box 887
751 08 Uppsala

Martin Wendel E-Mail: