Mailwarrior Mail Converter

This page describes the conversion of mail warrior 3.55 and 3.61 mail folders into the widely used mbox format.

Note: Another converter written in perl is available here: http://caravela.homelinux.net/~higuita/kmx2mbox/

Introduction

Mailwarrior is imho one of the best mail programs for the windows operating system. It is free, can handle several mail server in a perfect way (all mails go into one folder; one column in the folder overview shows the server name; within the mail editor, the mail server used to send the mail can be changed easily by a combo box), and it does not support any active content at all (no virus infection risk). Highly recommended!

Now I am slowly switching over to unix like operating systems and I would like to feed my new mail client (probably kmail) with the mails stored in mailwarrior. Unfortunately the latter does not have any export mail functionality. So I quickly coded a program converting these mails into the widely used mbox format.

Internals: structure of an mbox file

An mbox file is a plain text file containing all mails one after the other. Each mail starts with the line "From email_address date". The mail ends with the end of the mbox file or with the next line of this kind. Therefore a mail may not contain such a line in its body (such lines have to be changed f.e. by putting an '>' in front of it).

Each mail has a header and a body. The header has some fixed fields like 'To:' or 'Subject:' and several free fields usually set by the sender and routing server. Important is the mime type, which defines how the body is to be interpreted.
The body usually consists of plain text or html text (depending of the mime type). Usually the mime type is multipart/mixed or multipart/alternative. In this case the mail body consists of several parts, each terminated by a special marker, which is defined in mime type definition. multipart/alternative mails are mails which are coded in more than one way f.e. as plain text and html so that the receiver could display it in its favourite way. multipart/mixed mails have attachments f.e. attached pictures. To make it even more complicated these multipart parts can be nested i.e. a multipart/alternative mail has multipart/mixed attachments in one or both of its alternative codings.

Binary attachments are base 64 encoded and a usual part of the massage. The header of the part defines its mime type and file name.

Internals: mail handling by mailwarrior

Mailwarrior obviously handles mails as follows: The header is processed. Special fields like 'From:' , 'To:' or the first mime type given are extracted and stored in special locations. The remaining header is stored as header. The body is processed. All binary attachments are stored with the provided file name in the attachments directory. If the file name exists, a number is added to prevent overwriting of an existing attachment. The base 64 encoded text is removed. Finally all multipart separators with all its information (like mime type of the next part) are removed. The remaining body is stored as body.

This procedure has some drawbacks:

on multipart messages the mime type is multipart/xxxx. The real mime type of the body is lost
the mime types of the attachments are lost
On multipart/alternative messages both ports are stored as one body. You can see it in mailwarrior: the text of a multipart/alternative message appears twice; the second time decorated with html code.

This information are lost and can not be recovered by converting the mails!

Internals: mail storage by mailwarrior

Mailwarrior stores all mails of a folder in one binary file with the extension .mwm. Each file consists of two parts. The format is described here as it was found by analyzing these files:

Part 1: the mails (actual and already deleted)

Header: 4 Bytes "QDB\0" followed by 1 integer (integers are always 4 bytes little endian): The size of the QDB part.
The mails (mailwarrior 3.55):

1 integer (unknown purpose)
subject (all strings are terminated by \0d\0a)
3 from lines in different formats
unknown string
to, cc and bcc line
mail id (not on sent mails)
mail account
mime type
1 unknown string
3 unknown strings containing a 0 or a 1
8 bytes forming the mail date (see below)
a unique string terminating the following fields of variable length
list of attachments. These are \0d\0a terminated strings which are the file names of the attachments (absolute path) terminated with the unique string above. All files found here are checked if they exist. The list of existing attachment files are stored in a separate list, which is used later to convert the attachments.
list of header lines terminated with the unique string above
list of mail body lines terminated with the unique string above

The mails (mailwarrior 3.61):

1 integer (unknown purpose)
subject (all strings are terminated by \0d\0a)
3 from lines in different formats
unknown string
to, cc and bcc line
unknown string
mail account
mime type
6 unknown strings containing a 0 or a 1
2 unknown strings
8 bytes forming the mail date (see below)
8 unknown bytes
1 integer: pointing to the end of the body
a unique string terminating the following header line list of variable length
list of header lines terminated with the unique string above
list of mail body lines terminated by the pointer above
list of attachments. These are \0d\0a terminated strings which are the file names of the attachments (absolute path). All files found here are checked if they exist. The list of existing attachment files are stored in a separate list, which is used later to convert the attachments. The main problem here is, that the attchment list ends at the and of the mail, which is unknown. The end of the Mail is generated from the index part below.

From the unique string a boundary string is calculated by removing all characters which are not a letter, a digit '.' or '_'. This boundary is required later if the mail has attachments. A mime type is guessed: text/plain or text/html if the mail body contains a '<html>'.

Part 2: the mail index

Mailwarrior 3.55: reference the not deleted mails.
Mailwarrior 3.61: Index and length of the mails.
Header: 4 Bytes "QIX\0" followed by 2 integers: The size of the QIX part and the number of entries (== the number of not deleted mails).
The index entries:

an unknown string terminated by \0
1 integer giving the position of the mail in the first part this entry is pointing to
1 unknown integer (mailwarrior 3.61: length of the mail)
1 unknown integer

That's it. The index part is not ordered and has to be sorted on mailwarrior 3.61 conversions to get a list of mail lengths required to read the mails.

Internals: the conversion process.

The converter first skips the mail part and reads the index part forming an internal list of indecies of those mails, which are not deleted (mailwarrior 3.61: to get the lengths of the mails). The second pass reads the mail part as described above. Mails, which are not deleted (mailwarrior 3.61: all mails), are converted into the mbox format immediately. Afterwards the mail is removed from memory in order to prevent a memory overflow.

For each mail first the from line is created from the third 'from' field (see above) and the mail date.
Next the header lines are written. All 'to', 'cc' and 'bcc' lines are removed. If a line starting with 'Content-Type: ' is encountered, the 'to', 'cc' and 'bcc' are written with the data directly read from the mwm file. All lines belonging to the content type declaration are removed. Then the content type is written: if the mail has attached files, which are found on disc, the mime type multipart/mixed is written with the border string generated above. Otherwise the guessed mime type is taken.

If the 'to', 'cc' and 'bcc' lines are not written so far they are written now. If the mail has attachments, which are found on disk, a content type header with the guessed mime type is written. In any case the mail body is written now. If a line starts with 'From ' and contains an '@', a '>' is put at the beginning of the line.

If the mail has attachments, which are found on disk, all attachments are base 64 encoded and written. The mime type of each part is guessed from the file extension, see contentTypeFromName(). If the extension could not be identified, 'application/octet-stream' is taken.

Internals: the conversion program

The conversion program is written in java. It consists of 2 classes with several internal classes, i.e. exactly the way, a program should not be written. It was a quick hack.

It starts with InputStream, which adds a readln to DataInputStream. BufferedReader could not be used because buffered reader converts bytes into Unicode!

Next the class date. It was stolen from GregorianDate in order to convert the number of milliseconds that have passed since 1970-01-01. GregorianDate and (which Date usues internally) was not able to do that, because these classes fiddle around with the time zone giving results which differ some hours from the time seen in mailwarrior! toAscTime() returns the time in the strange format required in the 'From' line of mbox.

The class MailIndex just contains one mail index (QIX part of the mwm file).

The class Mail contains one mail. readMail() reads the contents from the in stream. toMbox() converts it contents into the mbox format. encodeFile() coverts a file into a base 64 ascii text. contentTypeFromName() guesses the mime type from the file extension. getDate() creates a date. It is read as a long (8 byte) number. Double.longBitsToDouble() converts this long into a double. This is possible, because the .mwm file stores the date as an IEEE 754 double number counting as the days since 1900-01-01 (due to a calculating error it is actually the number of days since 1899-12-31). Therefore getDate() converts the this double into the number of milliseconds that have passed since 1970-01-01. Date.setTime() converts this into a date without changing it according to the time zone. Not nice but works.

Base64 was taken from http://www.ruffboy.com/download.htm and modified slightly (the modifications turned out not to be necessary in this version).

The main program scans through the current directory for *.mwm files. These files are converted. If the conversion of one file fails, the process is continued on the next. For each .mwm file one .mbox file is created. If there is no .mwm file hte 3.61 mode is activated. Then all directories are scanned for .kmx files, which are converted in a similar manner.

Revision History

Version 2

change: print dots during conversion
fix: multiple attachments
change: attachments marked as attachment (not as inline)
change: also reads MW 3.61 folders (exprimental)

Version 2.1

bugfix: crash if folder index not ordered

Version 2.2

bugfix: crash if file attachment too long
sent-items: "From:", "Subject:" and "Date:" line are written in any case
empty line between mails
bugfix: sent-items: no "Content-Type:" line even if attachments available

Version 2.3

bugfix: hangs if mail has no body

Usage

The converter is written in java. It is tested with mailwarrior 3.55 and 3.61 as source and kmail as target mail client. No guarantee that it works with other versions and/or clients or that it works with other mail folders than mine at all!

Unzip the Qdb.zip into the folder containing mailwarriors .mwm files. On mailwarrior 3.61 use the mail folder, i.e. the folder above the .kmx files.
open console (Start -> Run -> cmd or command <enter>)
type: path_to_java_exe\java Qdb

alternatively:

Copy Qdb.zip into the folder described above
open console (Start -> Run -> cmd or command <enter>)
type: path_to_java_exe\java -cp Qdb.zip Qdb

This creates a set of files (one for each mail container) in the mbox format. The original .mwm files are not touched. If you don't trust Qdb just copy the .mwm files you would like to convert into a directory and start the converter there. On mailwarrior 3.61: copy all directories containing the .kmx files into one directory and start the converter there.

Note, that all binary attachments are included into the conversion as well. To avoid this rename the attachment directory during the conversion process.

Important

This program can not convert compacted folders!
In case your mailbox is compacted, qdb might:

do nothing at all
crash
do some strange things
freezes

If one of the above mentioned happens, please try the following:

Start MW 3.61, create a new mailbox by right-clicking the tree-root and selecting "new folder", naming it "dummyfolder"
right-click the "dummyfolder" and select "new mailbox", naming it "dummy"
now copy all mails you wish to convert into this new mailbox.
close mail warrior
create a new directory somewhere (NOT below "Mails"!) using the explorer
Use the explorer to copy the directory "dummyfolder" below "Mails" (in the MW installation directory) to the newly created directory
copy Qdb.zip into the same directory
in this directory run "PathToJava\java -cp Qdb.zip Qdb" in a dos shell

Now the new folder should contain a file called dummyfolder.FLD.dummy.mbox

Now you can restart MW and either delete the whole dummy-folder or restart the process by deleting the converted mails from it and adding new ones. Don't forget to save the .mbox file from the first run so that it doesn't get replaced.

It's important to make sure that the "dummy" folder is not compressed. Doing this once while using MW results in the above mentioned errors!

OK, I want it (22k)!

Last modification: 2003-05-30
Back