Skip to content

Commit

Permalink
Allow OpenXML templates to be used with docx.
Browse files Browse the repository at this point in the history
The `--reference-doc` option allows customization of styles in docx
output, but it does not allow one to adjust the content of the output
(e.g., changing the order in which metadata, the table of contents,
and the body of the document are displayed), or adding boilerplate
text before or after the document body.  For these changes, one can
now use `--template` with an OpenXML template.  (See the default
`openxml` template for a sample.)

This patch also allows `--include-before-body` and
`--include-after-body` to be used with `docx` output.
The included files must be OpenXML fragments suitable for
inclusion in the document body.

Closes #8338 (`--include-before-body`, `--include-after-body`).

Closes #9069 (a custom template can be used to omit the
title page).

Closes #7256.

Closes #2928.
  • Loading branch information
jgm committed May 19, 2024
1 parent e8f44a8 commit db559e1
Show file tree
Hide file tree
Showing 42 changed files with 170 additions and 91 deletions.
12 changes: 10 additions & 2 deletions MANUAL.txt
Original file line number Diff line number Diff line change
Expand Up @@ -928,15 +928,23 @@ header when requesting a document from a URL:
`\begin{document}` command in LaTeX). This can be used to include
navigation bars or banners in HTML documents. This option can be
used repeatedly to include multiple files. They will be included in
the order specified. Implies `--standalone`.
the order specified. Implies `--standalone`. Note that if the
output format is `odt`, this file must be in OpenDocument XML format
suitable for insertion into the body of the document, and if
the output is `docx`, this file must be in appropriate
OpenXML format.

`-A` *FILE*, `--include-after-body=`*FILE*|*URL*

: Include contents of *FILE*, verbatim, at the end of the document
body (before the `</body>` tag in HTML, or the
`\end{document}` command in LaTeX). This option can be used
repeatedly to include multiple files. They will be included in the
order specified. Implies `--standalone`.
order specified. Implies `--standalone`. Note that if the
output format is `odt`, this file must be in OpenDocument XML format
suitable for insertion into the body of the document, and if
the output is `docx`, this file must be in appropriate
OpenXML format.

`--resource-path=`*SEARCHPATH*

Expand Down
69 changes: 69 additions & 0 deletions data/templates/default.openxml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
<?xml version="1.0" encoding="UTF-8"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">
<w:body>
$if(title)$
<w:p>
<w:pPr>
<w:pStyle w:val="Title" />
</w:pPr>
$title$
</w:p>
$endif$
$if(subtitle)$
<w:p>
<w:pPr>
<w:pStyle w:val="Subtitle" />
</w:pPr>
$subtitle$
</w:p>
$endif$
$for(author)$
<w:p>
<w:pPr>
<w:pStyle w:val="Author" />
</w:pPr>
$author$
</w:p>
$endfor$
$if(date)$
<w:p>
<w:pPr>
<w:pStyle w:val="Date" />
</w:pPr>
$date$
</w:p>
$endif$
$if(abstract)$
<w:p>
<w:pPr>
<w:pStyle w:val="AbstractTitle" />
</w:pPr>
$if(abstract-title)$
$abstract-title$
$else$
<w:r>
<w:t xml:space="preserve">Abstract
</w:t>
</w:r>
$endif$
</w:p>
$abstract$
$endif$
$for(include-before)$
$include-before$
$endfor$
$if(toc)$
$toc$
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
$-- sectpr will be set to the last sectpr in a reference.docx, if present
$if(sectpr)$
$sectpr$
$else$
<w:sectPr />
$endif$
</w:body>
</w:document>
1 change: 1 addition & 0 deletions pandoc.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ data-files:
data/templates/default.jats_publishing
data/templates/default.tei
data/templates/default.opendocument
data/templates/default.openxml
data/templates/default.icml
data/templates/default.opml
data/templates/default.latex
Expand Down
2 changes: 1 addition & 1 deletion src/Text/Pandoc/Templates.hs
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,12 @@ getDefaultTemplate format = do
"native" -> return ""
"csljson" -> return ""
"json" -> return ""
"docx" -> return ""
"fb2" -> return ""
"pptx" -> return ""
"ipynb" -> return ""
"asciidoctor" -> getDefaultTemplate "asciidoc"
"asciidoc_legacy" -> getDefaultTemplate "asciidoc"
"docx" -> getDefaultTemplate "openxml"
"odt" -> getDefaultTemplate "opendocument"
"html" -> getDefaultTemplate "html5"
"docbook" -> getDefaultTemplate "docbook5"
Expand Down
117 changes: 59 additions & 58 deletions src/Text/Pandoc/Writers/Docx.hs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import Text.Pandoc.Class (PandocMonad, toLang)
import qualified Text.Pandoc.Class.PandocMonad as P
import Text.Pandoc.Data (readDataFile, readDefaultDataFile)
import Data.Time
import qualified Text.Pandoc.UTF8 as UTF8
import Text.Pandoc.Definition
import Text.Pandoc.Error
import Text.Pandoc.MIME (getMimeTypeDef)
Expand Down Expand Up @@ -192,10 +193,64 @@ writeDocx opts doc = do
, envPrintWidth = maybe 420 (`quot` 20) pgContentWidth
}

parsedRels <- parseXml refArchive distArchive "word/_rels/document.xml.rels"
let isHeaderNode e = findAttr (QName "Type" Nothing Nothing) e == Just "http://schemas.openxmlformats.org/officeDocument/2006/relationships/header"
let isFooterNode e = findAttr (QName "Type" Nothing Nothing) e == Just "http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer"
let headers = filterElements isHeaderNode parsedRels
let footers = filterElements isFooterNode parsedRels
-- word/_rels/document.xml.rels
let toBaseRel (url', id', target') = mknode "Relationship"
[("Type",url')
,("Id",id')
,("Target",target')] ()
let baserels' = map toBaseRel
[("http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering",
"rId1",
"numbering.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles",
"rId2",
"styles.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings",
"rId3",
"settings.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings",
"rId4",
"webSettings.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable",
"rId5",
"fontTable.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme",
"rId6",
"theme/theme1.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/footnotes",
"rId7",
"footnotes.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments",
"rId8",
"comments.xml")
]

let idMap = renumIdMap (length baserels' + 1) (headers ++ footers)

-- adjust contents to add sectPr from reference.docx
let sectpr = case mbsectpr of
Just sectpr' -> let cs = renumIds
(\q -> qName q == "id" && qPrefix q == Just "r")
idMap
(elChildren sectpr')
in Just . ppElement $
add_attrs (elAttribs sectpr') $ mknode "w:sectPr" [] cs
Nothing -> Nothing


((contents, footnotes, comments), st) <- runStateT
(runReaderT
(writeOpenXML opts{writerWrapText = WrapNone} doc')
(writeOpenXML opts{ writerWrapText = WrapNone
, writerVariables =
(maybe id (setField "sectpr") sectpr)
(writerVariables opts)
}
doc')
env)
initialSt
let epochtime = floor $ utcTimeToPOSIXSeconds utctime
Expand All @@ -217,13 +272,7 @@ writeDocx opts doc = do
,("xmlns:wp","http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing")]


parsedRels <- parseXml refArchive distArchive "word/_rels/document.xml.rels"
let isHeaderNode e = findAttr (QName "Type" Nothing Nothing) e == Just "http://schemas.openxmlformats.org/officeDocument/2006/relationships/header"
let isFooterNode e = findAttr (QName "Type" Nothing Nothing) e == Just "http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer"
let headers = filterElements isHeaderNode parsedRels
let footers = filterElements isFooterNode parsedRels

-- we create [Content_Types].xml and word/_rels/document.xml.rels
-- we create [Content_Types].xml and word/_rels/document.xml.rels
-- from scratch rather than reading from reference.docx,
-- because Word sometimes changes these files when a reference.docx is modified,
-- e.g. deleting the reference to footnotes.xml or removing default entries
Expand Down Expand Up @@ -284,39 +333,7 @@ writeDocx opts doc = do
let contentTypesEntry = toEntry "[Content_Types].xml" epochtime
$ renderXml contentTypesDoc

-- word/_rels/document.xml.rels
let toBaseRel (url', id', target') = mknode "Relationship"
[("Type",url')
,("Id",id')
,("Target",target')] ()
let baserels' = map toBaseRel
[("http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering",
"rId1",
"numbering.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles",
"rId2",
"styles.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings",
"rId3",
"settings.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings",
"rId4",
"webSettings.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable",
"rId5",
"fontTable.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme",
"rId6",
"theme/theme1.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/footnotes",
"rId7",
"footnotes.xml")
,("http://schemas.openxmlformats.org/officeDocument/2006/relationships/comments",
"rId8",
"comments.xml")
]

let idMap = renumIdMap (length baserels' + 1) (headers ++ footers)
let renumHeaders = renumIds (\q -> qName q == "Id") idMap headers
let renumFooters = renumIds (\q -> qName q == "Id") idMap footers
let baserels = baserels' ++ renumHeaders ++ renumFooters
Expand All @@ -328,27 +345,11 @@ writeDocx opts doc = do
let relEntry = toEntry "word/_rels/document.xml.rels" epochtime
$ renderXml reldoc


-- adjust contents to add sectPr from reference.docx
let sectpr = case mbsectpr of
Just sectpr' -> let cs = renumIds
(\q -> qName q == "id" && qPrefix q == Just "r")
idMap
(elChildren sectpr')
in
add_attrs (elAttribs sectpr') $ mknode "w:sectPr" [] cs
Nothing -> mknode "w:sectPr" [] ()

-- let sectpr = fromMaybe (mknode "w:sectPr" [] ()) mbsectpr'
let contents' = contents ++ [Elem sectpr]
let docContents = mknode "w:document" stdAttributes
$ mknode "w:body" [] contents'


let contents' = BL.fromStrict $ UTF8.fromText contents

-- word/document.xml
let contentEntry = toEntry "word/document.xml" epochtime
$ renderXml docContents
let contentEntry = toEntry "word/document.xml" epochtime contents'

-- footnotes
let notes = mknode "w:footnotes" stdAttributes footnotes
Expand Down
60 changes: 30 additions & 30 deletions src/Text/Pandoc/Writers/Docx/OpenXML.hs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ import Data.Text (Text)
import qualified Data.Text.Lazy as TL
import Data.Digest.Pure.SHA (sha1, showDigest)
import Skylighting
import Text.DocLayout (hcat, vcat, literal, render)
import Text.Pandoc.Class (PandocMonad, report, getMediaBag)
import Text.Pandoc.Translations (Term(Abstract), translateTerm)
import Text.Pandoc.MediaBag (lookupMedia, MediaItem(..))
Expand All @@ -45,6 +46,7 @@ import Text.Pandoc.UTF8 (fromTextLazy)
import Text.Pandoc.Definition
import Text.Pandoc.Generic
import Text.Pandoc.Highlighting (highlight)
import Text.Pandoc.Templates (compileDefaultTemplate, renderTemplate)
import Text.Pandoc.ImageSize
import Text.Pandoc.Logging
import Text.Pandoc.MIME (extensionFromMimeType, getMimeType)
Expand Down Expand Up @@ -167,43 +169,29 @@ makeTOC opts = do
])
]] -- w:sdt

-- | Convert Pandoc document to two lists of
-- OpenXML elements (the main document and footnotes).
writeOpenXML :: (PandocMonad m)
-- | Convert Pandoc document to rendered document contents plus two lists of
-- OpenXML elements (footnotes and comments).
writeOpenXML :: PandocMonad m
=> WriterOptions -> Pandoc
-> WS m ([Content], [Element], [Element])
-> WS m (Text, [Element], [Element])
writeOpenXML opts (Pandoc meta blocks) = do
let tit = docTitle meta
let auths = docAuthors meta
let dat = docDate meta
let abstract' = lookupMetaBlocks "abstract" meta
let subtitle' = lookupMetaInlines "subtitle" meta
setupTranslations meta
let includeTOC = writerTableOfContents opts || lookupMetaBool "toc" meta
title <- withParaPropM (pStyleM "Title") $ blocksToOpenXML opts [Para tit | not (null tit)]
subtitle <- withParaPropM (pStyleM "Subtitle") $ blocksToOpenXML opts [Para subtitle' | not (null subtitle')]
authors <- withParaPropM (pStyleM "Author") $ blocksToOpenXML opts $
map Para auths
date <- withParaPropM (pStyleM "Date") $ blocksToOpenXML opts [Para dat | not (null dat)]
abstract <- if null abstract'
then return []
else do
abstractTitle <- case lookupMeta "abstract-title" meta of
Just (MetaBlocks bs) -> pure $ stringify bs
Just (MetaInlines ils) -> pure $ stringify ils
Just (MetaString s) -> pure s
_ -> translateTerm Abstract
abstractTit <- withParaPropM (pStyleM "AbstractTitle") $
blocksToOpenXML opts
[Para [Str abstractTitle]]
abstractContents <- withParaPropM (pStyleM "Abstract") $
blocksToOpenXML opts abstract'
return $ abstractTit <> abstractContents
abstractTitle <- case lookupMeta "abstract-title" meta of
Just (MetaBlocks bs) -> pure $ stringify bs
Just (MetaInlines ils) -> pure $ stringify ils
Just (MetaString s) -> pure s
_ -> translateTerm Abstract
abstract <- case lookupMetaBlocks "abstract" meta of
[] -> return []
xs -> withParaPropM (pStyleM "Abstract") $ blocksToOpenXML opts xs

let convertSpace (Str x : Space : Str y : xs) = Str (x <> " " <> y) : xs
convertSpace (Str x : Str y : xs) = Str (x <> y) : xs
convertSpace xs = xs
let blocks' = bottomUp convertSpace blocks
doc' <- setFirstPara >> blocksToOpenXML opts blocks'
let body = vcat $ map (literal . showContent) doc'
notes' <- gets (reverse . stFootnotes)
comments <- gets (reverse . stComments)
let toComment (kvs, ils) = do
Expand All @@ -226,8 +214,20 @@ writeOpenXML opts (Pandoc meta blocks) = do
toc <- if includeTOC
then makeTOC opts
else return []
let meta' = title ++ subtitle ++ authors ++ date ++ abstract ++ map Elem toc
return (meta' ++ doc', notes', comments')
metadata <- metaToContext opts
(fmap (vcat . map (literal . showContent)) . blocksToOpenXML opts)
(fmap (hcat . map (literal . showContent)) . inlinesToOpenXML opts)
meta
let context = defField "body" body
. defField "toc"
(vcat (map (literal . showElement) toc))
. defField "abstract"
(vcat (map (literal . showContent) abstract))
. defField "abstract-title" abstractTitle
$ metadata
tpl <- maybe (lift $ compileDefaultTemplate "openxml") pure $ writerTemplate opts
let rendered = render Nothing $ renderTemplate tpl context
return (rendered, notes', comments')

-- | Convert a list of Pandoc blocks to OpenXML.
blocksToOpenXML :: (PandocMonad m) => WriterOptions -> [Block] -> WS m [Content]
Expand Down
Binary file modified test/docx/golden/block_quotes.docx
Binary file not shown.
Binary file modified test/docx/golden/codeblock.docx
Binary file not shown.
Binary file modified test/docx/golden/comments.docx
Binary file not shown.
Binary file modified test/docx/golden/custom_style_no_reference.docx
Binary file not shown.
Binary file modified test/docx/golden/custom_style_preserve.docx
Binary file not shown.
Binary file modified test/docx/golden/custom_style_reference.docx
Binary file not shown.
Binary file modified test/docx/golden/definition_list.docx
Binary file not shown.
Binary file modified test/docx/golden/document-properties-short-desc.docx
Binary file not shown.
Binary file modified test/docx/golden/document-properties.docx
Binary file not shown.
Binary file modified test/docx/golden/headers.docx
Binary file not shown.
Binary file modified test/docx/golden/image.docx
Binary file not shown.
Binary file modified test/docx/golden/inline_code.docx
Binary file not shown.
Binary file modified test/docx/golden/inline_formatting.docx
Binary file not shown.
Binary file modified test/docx/golden/inline_images.docx
Binary file not shown.
Binary file modified test/docx/golden/link_in_notes.docx
Binary file not shown.
Binary file modified test/docx/golden/links.docx
Binary file not shown.
Binary file modified test/docx/golden/lists.docx
Binary file not shown.
Binary file modified test/docx/golden/lists_continuing.docx
Binary file not shown.
Binary file modified test/docx/golden/lists_div_bullets.docx
Binary file not shown.
Binary file modified test/docx/golden/lists_multiple_initial.docx
Binary file not shown.
Binary file modified test/docx/golden/lists_restarting.docx
Binary file not shown.
Binary file modified test/docx/golden/nested_anchors_in_header.docx
Binary file not shown.
Binary file modified test/docx/golden/notes.docx
Binary file not shown.
Binary file modified test/docx/golden/raw-blocks.docx
Binary file not shown.
Binary file modified test/docx/golden/raw-bookmarks.docx
Binary file not shown.
Binary file modified test/docx/golden/table_one_row.docx
Binary file not shown.
Binary file modified test/docx/golden/table_with_list_cell.docx
Binary file not shown.
Binary file modified test/docx/golden/tables-default-widths.docx
Binary file not shown.
Binary file modified test/docx/golden/tables.docx
Binary file not shown.
Binary file modified test/docx/golden/tables_separated_with_rawblock.docx
Binary file not shown.
Binary file modified test/docx/golden/track_changes_deletion.docx
Binary file not shown.
Binary file modified test/docx/golden/track_changes_insertion.docx
Binary file not shown.
Binary file modified test/docx/golden/track_changes_move.docx
Binary file not shown.
Binary file modified test/docx/golden/track_changes_scrubbed_metadata.docx
Binary file not shown.
Binary file modified test/docx/golden/unicode.docx
Binary file not shown.
Binary file modified test/docx/golden/verbatim_subsuper.docx
Binary file not shown.

0 comments on commit db559e1

Please sign in to comment.