Apache POI 实现word(doc/docx)浏览器预览 - 灰信网(软件开发博客聚合)

一、环境准备

1.jdk:1.8

2.maven:3.6

3.springboot:2.2.2

二、maven主要依赖

<dependency>    <groupId>org.apache.poi</groupId>    <artifactId>poi</artifactId>    <version>4.1.0</version></dependency> <dependency>    <groupId>org.apache.poi</groupId>    <artifactId>poi-scratchpad</artifactId>    <version>4.1.0</version></dependency> <dependency>    <groupId>org.apache.poi</groupId>    <artifactId>poi-ooxml</artifactId>    <version>4.1.0</version></dependency> <dependency>    <groupId>fr.opensagres.xdocreport</groupId>    <artifactId>xdocreport</artifactId>    <version>2.0.2</version></dependency> <dependency>    <groupId>org.apache.poi</groupId>    <artifactId>poi-ooxml-schemas</artifactId>    <version>4.1.0</version></dependency> <dependency>    <groupId>org.apache.poi</groupId>    <artifactId>ooxml-schemas</artifactId>    <version>1.4</version></dependency>

三、具体实现

1.docToHtml

@RequestMapping("/wordToHtml")    public void wordToHtml(HttpServletResponse response){        final String path = "C:\\usr\\local\\";        final String file = "5页.doc";        try{            InputStream input = new FileInputStream(path + file);            docToHtml(input, response);        }catch (Exception e){            e.printStackTrace();        }     }
public void docToHtml(InputStream input, HttpServletResponse response) throws Exception{        HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(input);        WordToHtmlConverter wordToHtmlConverter = new ImageConverter(                DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()        );        wordToHtmlConverter.processDocument(wordDocument);        Document htmlDocument = wordToHtmlConverter.getDocument();        ByteArrayOutputStream outStream = new ByteArrayOutputStream();        DOMSource domSource = new DOMSource(htmlDocument);        StreamResult streamResult = new StreamResult(outStream);        TransformerFactory tf = TransformerFactory.newInstance();        Transformer serializer = tf.newTransformer();        serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");        serializer.setOutputProperty(OutputKeys.INDENT, "yes");        serializer.setOutputProperty(OutputKeys.METHOD, "html");        serializer.transform(domSource, streamResult);        outStream.close();        // 清空response        response.reset();        OutputStream toClient = new BufferedOutputStream(response.getOutputStream());        response.setContentType("text/html");        response.setCharacterEncoding("UTF-8");        toClient.write(outStream.toByteArray());        toClient.flush();        toClient.close();    }
//图片处理public class ImageConverter extends WordToHtmlConverter{     public ImageConverter(Document document) {        super(document);    }    @Override    protected void processImageWithoutPicturesManager(Element currentBlock, boolean inlined, Picture picture){        Element imgNode = currentBlock.getOwnerDocument().createElement("img");        StringBuffer sb = new StringBuffer();        sb.append(Base64.getMimeEncoder().encodeToString(picture.getRawContent()));        sb.insert(0, "data:" + picture.getMimeType() + ";base64,");        imgNode.setAttribute("src", sb.toString());        currentBlock.appendChild(imgNode);    }}

预览效果:

2.docxToHtml

@RequestMapping("/wordToHtml")    public void wordToHtml(HttpServletResponse response){        final String path = "C:\\usr\\local\\";        final String file = "3.docx";        try{            InputStream input = new FileInputStream(path + file);            docxToHtml(input, response);         }catch (Exception e){            e.printStackTrace();        }     }
public void docxToHtml(InputStream inputStream, HttpServletResponse response) throws IOException {        XWPFDocument docxDocument = new XWPFDocument(inputStream);        XHTMLOptions options = XHTMLOptions.create();        //图片转base64        options.setImageManager(new Base64EmbedImgManager());        // 转换htm1        ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();        XHTMLConverter.getInstance().convert(docxDocument, htmlStream, options);        // 清空response        response.reset();        OutputStream toClient = new BufferedOutputStream(response.getOutputStream());        response.setContentType("text/html");        response.setCharacterEncoding("UTF-8");        toClient.write(htmlStream.toByteArray());        toClient.flush();        toClient.close();    }

预览效果:

四、总结

1.主要几个maven包的依赖版本需要一致

2.文档需要标准的word文档,举个例子,从boss直聘上下载下来的简历不能预览,因为里面内容实际是html格式,会出现异常:

Docment is really HTML File,需要把文件另存为标准word格式

3.不能直接修改文件后缀名预览,虽然office能打开,但是不是标准word格式,需要另存为你想要的格式(doc,docx),否则会出现异常java.lang.IllegalArgumentException: The document is really a OOXML file

4.尝试过spire.doc,用的是免费版,文档超过三页不能预览,这一方面官网给出了解释,最终选定poi这个方案


原网址: 访问
创建于: 2023-06-05 10:06:54
目录: default
标签: 无

请先后发表评论
  • 最新评论
  • 总共0条评论