Blogs - Web Technologies Blog
Using Microsoft Word to write website content - Text Based clean up
Written by Marco Conti
Wednesday, 19 November 2008 14:29
Page 2 of 4
The simplest way to clean an MS Word file is to copy directly from the Word document, paste in a plain text editor and then paste into the HTML editor of choice. There is a rub: all formatting will be lost and most likely you'll need to reproduce the original document's formatting.
An alternative is to save the document in MS Word as Plain text. Both methods will retain the paragraph breaks if pasted into a WYSIWYG HTML editor, but not if pasted into the HTML view. You'll need to paste into the Visual edit view. Here is a screenshot of the results in Adobe Dreamweaver:

As you can see, all the formatting, bold, Italic, etc. is gone, but that's usually fairly easy to reproduce and often is preferable.
I have tried using the Dreamweaver built in "Clean Word HTML" and while it works to an extent, it still leaves about half the unnecessary code untouched. It's just not good enough.