Frequently, software processing textual data cannot directly handle HTML-files, which are the most common files found on the web. Below programs convert HTML into plain Text, many of them do batch conversion. To get an idea about the capabilities of these programs, you can get the plain-text-version of the current HTML page, as created by the different programs by clicking on the "Conversion Result"-links. NB: These conversion files are usually much better than conversions done with Microsoft™ Word™ (Conversion Result of Word™).
Web2Text is a bare bones program, with which you can batch convert HTML files into TXT. It allows you to configure the most important options (such as line length) and yields decent results.
Detagger contains a few more customization options than Web2Text, so you should check, if you require the additional options (such as an restriction on the output file to contain only ASCII characters etc.). You can test a fully function version of this shareware, which currently costs $20 (US).
If you do not require paragraph marks in HTML to be reproduced in the ASCII file, try freeware HTMLtoTXT.
- Markup Remover
This Windows 3.11 style tag remover was shareware and has some useful customization facilities, most importantly, it can convert to ASCII, iso 8859-1, and ANSI (for UNIX). Unfortunately, the program is no longer available.
- Microblast HTML to TEXT
Microblast's HTML to TEXT (shareware @ US-$ 10) features the most intuitive interface, but yields at best mediocre results. It is not customizable, it does not even allow for adjustments, not even the line breaks are configurable. Its "Open" and "Save" menus do not follow the Windows™ standard (there are no standard file type filters) and batch conversions are not implemented. (Conversion Result)
NoteTab is not a stand-alone detagger, but a full fledged ASCII/HTML-Editor. The shareware fee of US-$ 19.95 will yield a quick pay back, as it most effectively transforms HTML into plain text, as its results are very clean and you can batch convert many files.
- HTML Markdown
HTML Markdown was written for the PowerMac.
- more HTML converters
- verypdf PDF2TXT
Shareware conversion tool batch concerts PDF documents into ASCII.
- Advanced PDF Manager
Shareware for managing PDF files, which contains a batch conversion facility that turns PDF files into plain text.
Command-line interface freeware, part of the XPDF package.
- ABC Amber Textconverter
This shareware conversion tool performs conversions between many major file formats, namely:
- ANSI (.txt)
- Unicode (.txt)
- Rich Text Format (.rtf)
- Microsoft™ Word™ (.doc)
- Corel™ WordPerfect™ (.doc)
- Lotus™ AmiPro™ (.ami)
- Microsoft™ Excel™ (.xls)
- Lotus™ 1-2-3™
- Adobe™ Portable Data Format™ (.pdf).
- Microsoft Word 2002 and later versions contain a batch conversion wizard.