Word Conversion with CT

Word continues to be the most common document format we see but it does cause a lot of headaches for publishers or anyone wanting clean, consistent content input for later processing.

Tagged data is the basis of all high-quality typeset documents because the correction cycle workflows are more easily managed outside of Word, but many customers also require a Word proof as a deliverable, along with structured data like XML/HTML.

CT is our Word conversion tool and developed in house to convert Word documents to the absolute minimal XML needed and it’s fast. It can extract embedded content such as spreadsheets, presentations and high-resolution images and automatically fix common table errors as well as strip out any unnecessary internal document data and does it in seconds. 

CT has extensive options to control what is extracted so the XML/*SML output is always clean and consistent, no matter how badly authored the source document is. Manual edits of the resultant output are verified by the integrated validation module so you can be confident it will go straight into your composition tool or XSLT pipeline without error. 

Going back to Word from XML is literally a click away with full control of styles to provide a true ‘round trip’ solution.


  • Very fast Word (docx) conversion
  • Output format XML, typesetter friendly *SML, JSON or text
  • Extensive options to control output
  • Extract of MS Office objects
  • Extract images (high res)
  • Correct common table alignment errors
  • Purge unnecessary internal properties
  • Heading style detection based on content
  • Remove empty table columns/rows
  • Intelligent financial table content merging
  • Header row auto-detection
  • Create Word documents directly from XML
  • Inline regex operations
  • Call external programs for further processing e.g XSLT, Perl
  • Simple drag and drop GUI or command line operation

Free Trial

Request a no-obligation, fully functional 14-day free trial: Evaluations


For more information about our other CT product options and pricing, please visit our Products page.

*SML is a simple tag format designed for fast manual typesetting but which is fully validated to ensure it is correct before importing into your typesetting platform such as Arbortext APP and Adobe InDesign, sample screenshots of which are shown below.