If you have ever tried copying text from Microsoft Word to your HTML files or to your CMS, you probably encountered a huge amount of excess HTML code. One Word document I had to work with had 245KB of junk from total 284KB of HTML. For HTML from Word it’s not unusual to see couple of font and span elements around each letter in text. This is where Word HTML Cleaner comes in play.
After I had to process couple of Word documents that client has provided me for their website I decided to build my own Word HTML Cleaner. This is a online tool to strip Microsoft’s proprietary tags and other excess (duplicate, multiple opened bold tags, etc.) HTML from Word-generated HTML documents, leaving all the important HTML intact. HTML size will be reduced in some cases up to 90%. Obviously you would still need to check the appearance of the output so you can make sure everything is in order.
How To Use
Open your Word document, open Word HTML Cleaner and use one of the two modes:
Visual mode allows you to copy text directly from Word. After you paste text from Word, tool will automatically process and clean HTML code which will be outputted to the bottom text area marked “Output”.
Copy & paste HTML code from already built Word HTML page. Same as previous mode, this will also automatically clean HTML code.
After Word HTML Sanitizer finishes processing your HTML it will output stats of removed HTML tags (marked with blue color) and HTML tags that were found and should be checked before you put this on your website (marked with red color) along with clean HTML.
There are some HTML tags that need more user attention, this tags are anchors (links) and images. Reason for this is because they contain URLs which could be pointing on your local hard drive instead of proper web URL.
Problematic browsers are IE (IE6, IE7, IE8), and Opera every version. They have bad RegEx support so the output will not be as optimized as in other normal browsers. Opera also doesn’t support listening for paste events, so I had to make hacks for that… hopefully this will be available in Opera 12.50.
Where is this free online tool?
Go to this webpage: Word HTML Cleaner