But, here is a working snippet I extracted from tika-app:
ByteArrayOutputStream out = new ByteArrayOutputStream(); SAXTransformerFactory factory = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); TransformerHandler handler = factory.newTransformerHandler(); handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html"); handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes"); handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, "UTF-8"); handler.setResult(new StreamResult(out)); ExpandedTitleContentHandler handler1 = new ExpandedTitleContentHandler(handler); tikaParser.parse(new ByteArrayInputStream(file), handler1, new Metadata()); return new String(out.toByteArray(), "UTF-8");It works pretty nicely. Here is an example of original MSOffice document:
And here how the above looks in my webapp as HTML preview:
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.