Mailwarrior Mail Converter
This page describes the conversion of mail warrior 3.55 and 3.61 mail folders into
the widely used mbox format.
Note: Another converter written in perl is available here:
http://caravela.homelinux.net/~higuita/kmx2mbox/
Introduction
Mailwarrior is imho one of the best mail programs for the windows operating
system. It is free, can handle several mail server in a perfect way (all
mails go into one folder; one column in the folder overview shows the server
name; within the mail editor, the mail server used to send the mail can be
changed easily by a combo box), and it does not support any active content
at all (no virus infection risk). Highly recommended!
Now I am slowly switching over to unix like operating systems and I would
like to feed my new mail client (probably kmail) with the mails stored in
mailwarrior. Unfortunately the latter does not have any export mail functionality.
So I quickly coded a program converting these mails into the widely used
mbox format.
Internals: structure of an mbox file
An mbox file is a plain text file containing all mails one after the other.
Each mail starts with the line "From email_address date". The mail ends with
the end of the mbox file or with the next line of this kind. Therefore a
mail may not contain such a line in its body (such lines have to be changed
f.e. by putting an '>' in front of it).
Each mail has a header and a body. The header has some fixed fields like
'To:' or 'Subject:' and several free fields usually set by the sender and
routing server. Important is the mime type, which defines how the body is
to be interpreted.
The body usually consists of plain text or html text (depending of the mime
type). Usually the mime type is multipart/mixed or multipart/alternative.
In this case the mail body consists of several parts, each terminated by
a special marker, which is defined in mime type definition. multipart/alternative
mails are mails which are coded in more than one way f.e. as plain text and
html so that the receiver could display it in its favourite way. multipart/mixed
mails have attachments f.e. attached pictures. To make it even more complicated
these multipart parts can be nested i.e. a multipart/alternative mail has
multipart/mixed attachments in one or both of its alternative codings.
Binary attachments are base 64 encoded and a usual part of the massage. The
header of the part defines its mime type and file name.
Internals: mail handling by mailwarrior
Mailwarrior obviously handles mails as follows: The header is processed.
Special fields like 'From:' , 'To:' or the first mime type given are extracted
and stored in special locations. The remaining header is stored as header.
The body is processed. All binary attachments are stored with the provided
file name in the attachments directory. If the file name exists, a number
is added to prevent overwriting of an existing attachment. The base 64 encoded
text is removed. Finally all multipart separators with all its information
(like mime type of the next part) are removed. The remaining body is stored
as body.
This procedure has some drawbacks:
- on multipart messages the mime type is multipart/xxxx. The real mime
type of the body is lost
- the mime types of the attachments are lost
- On multipart/alternative messages both ports are stored as one body.
You can see it in mailwarrior: the text of a multipart/alternative message
appears twice; the second time decorated with html code.
This information are lost and can not be recovered by converting the mails!
Internals: mail storage by mailwarrior
Mailwarrior stores all mails of a folder in one binary file with the extension
.mwm. Each file consists of two parts. The format is described here as it
was found by analyzing these files:
Part 1: the mails (actual and already deleted)
Header: 4 Bytes "QDB\0" followed by 1 integer (integers are always 4 bytes
little endian): The size of the QDB part.
The mails (mailwarrior 3.55):
- 1 integer (unknown purpose)
- subject (all strings are terminated by \0d\0a)
- 3 from lines in different formats
- unknown string
- to, cc and bcc line
- mail id (not on sent mails)
- mail account
- mime type
- 1 unknown string
- 3 unknown strings containing a 0 or a 1
- 8 bytes forming the mail date (see below)
- a unique string terminating the following fields of variable length
- list of attachments. These are \0d\0a terminated strings which are
the file names of the attachments (absolute path) terminated with the unique
string above. All files found here are checked if they exist. The list of
existing attachment files are stored in a separate list, which is used later
to convert the attachments.
- list of header lines terminated with the unique string above
- list of mail body lines terminated with the unique string above
The mails (mailwarrior 3.61):
- 1 integer (unknown purpose)
- subject (all strings are terminated by \0d\0a)
- 3 from lines in different formats
- unknown string
- to, cc and bcc line
- unknown string
- mail account
- mime type
- 6 unknown strings containing a 0 or a 1
- 2 unknown strings
- 8 bytes forming the mail date (see below)
- 8 unknown bytes
- 1 integer: pointing to the end of the body
- a unique string terminating the following header line list of variable length
- list of header lines terminated with the unique string above
- list of mail body lines terminated by the pointer above
- list of attachments. These are \0d\0a terminated strings which are
the file names of the attachments (absolute path).
All files found here are checked if they exist. The list of
existing attachment files are stored in a separate list, which is used later
to convert the attachments. The main problem here is, that the attchment list ends at
the and of the mail, which is unknown. The end of the Mail is generated from the index part below.
From the unique string a boundary string is calculated by removing all characters
which are not a letter, a digit '.' or '_'. This boundary is required later
if the mail has attachments. A mime type is guessed: text/plain or text/html
if the mail body contains a '<html>'.
Part 2: the mail index
Mailwarrior 3.55: reference the not deleted mails.
Mailwarrior 3.61: Index and length of the mails.
Header: 4 Bytes "QIX\0" followed by 2 integers: The size of the QIX part
and the number of entries (== the number of not deleted mails).
The index entries:
- an unknown string terminated by \0
- 1 integer giving the position of the mail in the first part this entry
is pointing to
- 1 unknown integer (mailwarrior 3.61: length of the mail)
- 1 unknown integer
That's it. The index part is not ordered and has to be sorted on mailwarrior 3.61
conversions to get a list of mail lengths required to read the mails.
Internals: the conversion process.
The converter first skips the mail part and reads the index part forming
an internal list of indecies of those mails, which are not deleted
(mailwarrior 3.61: to get the lengths of the mails). The second
pass reads the mail part as described above. Mails, which are not deleted
(mailwarrior 3.61: all mails),
are converted into the mbox format immediately. Afterwards the mail is removed
from memory in order to prevent a memory overflow.
For each mail first the from line is created from the third 'from' field
(see above) and the mail date.
Next the header lines are written. All 'to', 'cc' and 'bcc' lines are removed.
If a line starting with 'Content-Type: ' is encountered, the 'to', 'cc' and
'bcc' are written with the data directly read from the mwm file. All lines
belonging to the content type declaration are removed. Then the content type
is written: if the mail has attached files, which are found on disc, the
mime type multipart/mixed is written with the border string generated above.
Otherwise the guessed mime type is taken.
If the 'to', 'cc' and 'bcc' lines are not written so far they are written
now. If the mail has attachments, which are found on disk, a content type
header with the guessed mime type is written. In any case the mail body is
written now. If a line starts with 'From ' and contains an '@', a '>'
is put at the beginning of the line.
If the mail has attachments, which are found on disk, all attachments are
base 64 encoded and written. The mime type of each part is guessed from the
file extension, see contentTypeFromName(). If the extension could not be
identified, 'application/octet-stream' is taken.
Internals: the conversion program
The conversion program is written in java. It consists of 2 classes with
several internal classes, i.e. exactly the way, a program should not be written.
It was a quick hack.
It starts with InputStream, which adds a readln to DataInputStream. BufferedReader
could not be used because buffered reader converts bytes into Unicode!
Next the class date. It was stolen from GregorianDate in order to convert
the number of milliseconds that have passed since 1970-01-01. GregorianDate
and (which Date usues internally) was not able to do that, because these
classes fiddle around with the time zone giving results which differ some
hours from the time seen in mailwarrior! toAscTime() returns the time in
the strange format required in the 'From' line of mbox.
The class MailIndex just contains one mail index (QIX part of the mwm file).
The class Mail contains one mail. readMail() reads the contents from the
in stream. toMbox() converts it contents into the mbox format. encodeFile()
coverts a file into a base 64 ascii text. contentTypeFromName() guesses the
mime type from the file extension. getDate() creates a date. It is read as
a long (8 byte) number. Double.longBitsToDouble() converts this long into
a double. This is possible, because the .mwm file stores the date as an IEEE
754 double number counting as the days since 1900-01-01 (due to a calculating
error it is actually the number of days since 1899-12-31). Therefore getDate()
converts the this double into the number of milliseconds that have passed
since 1970-01-01. Date.setTime() converts this into a date without changing
it according to the time zone. Not nice but works.
Base64 was taken from http://www.ruffboy.com/download.htm and modified slightly
(the modifications turned out not to be necessary in this version).
The main program scans through the current directory for *.mwm files. These
files are converted. If the conversion of one file fails, the process is
continued on the next. For each .mwm file one .mbox file is created. If there is no .mwm
file hte 3.61 mode is activated. Then all directories are scanned for .kmx files,
which are converted in a similar manner.
Revision History
Version 2
- change: print dots during conversion
- fix: multiple attachments
- change: attachments marked as attachment (not as inline)
- change: also reads MW 3.61 folders (exprimental)
Version 2.1
- bugfix: crash if folder index not ordered
Version 2.2
- bugfix: crash if file attachment too long
- sent-items: "From:", "Subject:" and "Date:" line are written in any case
- empty line between mails
- bugfix: sent-items: no "Content-Type:" line even if attachments available
Version 2.3
- bugfix: hangs if mail has no body
Usage
The converter is written in java. It is tested with mailwarrior 3.55 and 3.61 as source
and kmail as target mail client. No guarantee that it works with other versions
and/or clients or that it works with other mail folders than mine at all!
- Unzip the Qdb.zip into the folder containing mailwarriors .mwm files.
On mailwarrior 3.61 use the mail folder, i.e. the folder above the .kmx files.
- open console (Start -> Run -> cmd or command <enter>)
- type: path_to_java_exe\java Qdb
alternatively:
- Copy Qdb.zip into the folder described above
- open console (Start -> Run -> cmd or command <enter>)
- type: path_to_java_exe\java -cp Qdb.zip Qdb
This creates a set of files (one for each mail container) in the mbox format.
The original .mwm files are not touched. If you don't trust Qdb just copy
the .mwm files you would like to convert into a directory and start the converter
there. On mailwarrior 3.61: copy all directories containing the .kmx files into
one directory and start the converter there.
Note, that all binary attachments are included into the conversion as well.
To avoid this rename the attachment directory during the conversion process.
Important
This program can not convert compacted folders!
In case your mailbox is compacted, qdb might:
- do nothing at all
- crash
- do some strange things
- freezes
If one of the above mentioned happens, please try the following:
- Start MW 3.61, create a new mailbox by right-clicking the tree-root and selecting
"new folder", naming it "dummyfolder"
- right-click the "dummyfolder" and select "new mailbox", naming it "dummy"
- now copy all mails you wish to convert into this new mailbox.
- close mail warrior
- create a new directory somewhere (NOT below "Mails"!) using the explorer
- Use the explorer to copy the directory "dummyfolder" below "Mails" (in the MW
installation directory) to the newly created directory
- copy Qdb.zip into the same directory
- in this directory run "PathToJava\java -cp Qdb.zip Qdb" in a dos shell
Now the new folder should contain a file called dummyfolder.FLD.dummy.mbox
Now you can restart MW and either delete the whole dummy-folder or restart the
process by deleting the converted mails from it and adding new ones. Don't forget to
save the .mbox file from the first run so that it doesn't get replaced.
It's important to make sure that the "dummy" folder is not compressed. Doing this once
while using MW results in the above mentioned errors!
OK, I want it (22k)!
Last modification: 2003-05-30
Back