This is a guide on how to generate well formatted Microsoft Word (docx) file from a markdown in Obsidian. Specifically, everything here is based on APA 7 Student Paper formatting. However, everything here is broadly applicable (at least for academic formatting) and I will cover how you can develop your own configuration.
This is how the components work together in my system:
graph TB
subgraph obsidian
A[markdown sections] --> K[embed sections into report.md] --> V[Easy Bake] --> J[report.baked.md] --> B[Enhancing Export]
J <--> G[Image Assets]
end
subgraph pandoc
B --> C[pagebreak.lua] --> D[abstract-section.lua] --> E[pandoc]
end
subgraph Reference Manager
Zotero --> X[Better BibTeX]
end
X --> Z[References.bib] --> E
W[style.csl] --> E
F[template.docx] --> E
Pre-requisites
Required
- Pandoc
- For macOS:
{sh} brew install pandoc
, otherwise see above.
- For macOS:
- BibTex citation file
- Generated by your reference manager.
- For example, Zotero with the Better BibTex plugin.
- Enhancing Export (for Obsidian)
- Calls pandoc from within Obsidian.
- template.docx
- A word file containing all the styles in the output
- I cover this file in more depth below.
- For generating APA 7 student papers, you can use my docx template.
- This is heavily modified from the sample student paper provided by the APA, in order to properly embed formatting into the styles.
- reference-style.csl
- pagebreak (Lua filter for Pandoc)
- Allows you to manually insert page breaks into markdown with
\pagebreak
- Can avoid using this by instead editing the pandoc template to add page breaks.
- Allows you to manually insert page breaks into markdown with
Optional
- Pandoc Reference List (for Obsidian)
- Live resolves & render your pandoc citations from a BibTex file
- Easy Bake (for Obsidian)
- Compiles all embeds/links in a markdown into a single document without embeds, inline. Required to get embeds to output correctly from markdown.
- abstract-section (Lua filter for Pandoc) ^0f1ab2 - Lets you write your abstract under a heading, instead of in the front matter. Nice!
Setup
Pandoc Options
- I’ve split these options into the “arguments” and “extra arguments” fields used by the Enhancing Export plugin. This split is arbitrary.
- You will need to update these paths to your system. I’ve kept my paths in to be illustrative of a fairly typical setup. You’d use windows paths on windows, I assume.
Option Breakdown/Guide
- Basic Setup:
-f "${fromFormat}+link_attributes"
- Read in a md file (enhanced export resolves fromFormat) with pandoc extension link_attributes
+link_attributes
enables setting photo size with a pipe, like Obsidian supports ([[MyImage|300]]
)
--resource-path="${currentDir}" --resource-path="${currentDir}/90-assets" --resource-path="${attachmentFolderPath}"
- Look for embedded assets such as images in this location
- You may want to change this for your local paths
-o "${outputPath}" -t docx
- Output file as docx to a path specified by enhanced export
- Citations
--wrap=preserve
- Maintain non-semantic newlines, to improve APA formatting around figures and tables.
--csl="${vaultDir}/95-automation/other-templates/apa7.csl"
- Use APA 7 Citations and References
- You WILL need to change this path to match your local paths
--bibliography="/Users/vi/Documents/Zotero-Library.bib"
- Use this bibtex file to generate references.
- You WILL need to change this path to match your local paths
- You will need to setup Better BibTex if you are a Zotero user in order to generate this file automatically.
-citeproc -M reference-section-title=References
- trigger citation generation with title
- Filters
--lua-filter="${vaultDir}/95-automation/other-templates/pagebreak.lua"
- Enable pagebreak
- You WILL need to change this path to match your local paths
--lua-filter="${vaultDir}/95-automation/other-templates/abstract-section.lua"
- Enable abstract-section
- Any section with L1 heading “Abstract” is considered the abstract
- You WILL need to change this path to match your local paths
- APA Styling
--reference-doc="${vaultDir}/95-automation/other-templates/student-paper-apa7.docx"
- See DOCX “Template”
- You WILL need to change this path to match your local paths
DOCX “Template”
- This is a “template” file, but is not a “word template” in the same way a .dotx is
- Pandoc by default will look for certain styles to match to different sections of your input markdown.
- The actual content of the file beyond the styles is totally irrelevant
- The DOCX provided has the following styles that pandoc uses:
- Title
- (Title, plus padding so it auto formats to a good spot without linebreaks)
- Author
- (All subtitle cover page info is entered in author field)
- AbstractTitle
- (Heading 1 + page break before)
- Abstract
- (Normal, minus the indent)
- Normal
- (Standard APA first line indent paragraph)
- Heading 1 through Heading 5
- Block quote
- (Fully indented)
- Hyperlink formatting (optional under APA7)
- Title
- You do not have to provide styling for the Reference List, as pandoc citeproc uses the CSL file to figure this out.
- You can see all the non-body styles used by default below.
Editing Styles
- Open “styles pane”
- Select “new style” or “modify style”
- Access additional options in the bottom right
Pandoc Template
- Pandoc additionally uses its own template format for formatting output.
- I don’t use a custom template in my own setup.
- We can specify a custom template with
--template
. - You can view the source of the default templates here
- Alternatively, you can view the actual defaults for your pandoc install with
pandoc -D <FORMAT>
- Alternatively, you can view the actual defaults for your pandoc install with
Output of
pandoc -D docx
<?xml version="1.0" encoding="UTF-8"?> <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"> <w:body> $if(title)$ <w:p> <w:pPr> <w:pStyle w:val="Title" /> </w:pPr> $title$ </w:p> $endif$ $if(subtitle)$ <w:p> <w:pPr> <w:pStyle w:val="Subtitle" /> </w:pPr> $subtitle$ </w:p> $endif$ $for(author)$ <w:p> <w:pPr> <w:pStyle w:val="Author" /> </w:pPr> $author$ </w:p> $endfor$ $if(date)$ <w:p> <w:pPr> <w:pStyle w:val="Date" /> </w:pPr> $date$ </w:p> $endif$ $if(abstract)$ $if(abstract-title)$ <w:p> <w:pPr> <w:pStyle w:val="AbstractTitle" /> </w:pPr> <w:r><w:t xml:space="preserve">$abstract-title$</w:t></w:r> </w:p> $endif$ $abstract$ $endif$ $for(include-before)$ $include-before$ $endfor$ $if(toc)$ $toc$ $endif$ $body$ $for(include-after)$ $include-after$ $endfor$ $-- sectpr will be set to the last sectpr in a reference.docx, if present $if(sectpr)$ $sectpr$ $else$ <w:sectPr /> $endif$ </w:body> </w:document>
- You can view the template format reference for pandoc here.
- Anything you see above that isn’t in this format is DOCX XML.
- This the markup language DOCX files contain.
- This is directly output into the resulting output file.
- Anything you see above that isn’t in this format is DOCX XML.
- The pandoc docs claim there is no template for
docx
, however,pandoc -D docx
outputs a pandoc template that aligns to the output ofpandoc -t docx
. - By looking for this template file, we can discover what variables pandoc accepts.
- These align to YAML front matter values in Obsidian, except for
$body$
, which corresponds to the rest of the markdown file - We can also see what style names (variables) are pulled from our template.docx file, if we look for the
<w:pStyle w:val=<STYLE_NAME>
tags. This will style whatever pandoc variable is placed after/before it.
- These align to YAML front matter values in Obsidian, except for
- You could make a copy and edit this file to add additional sections, reorder sections, add XML formatting between sections (i.e we could bake our pagebreaks into our template, instead of using a lua filter).
- You should search out another resource if you want to make substantial edits to this file, or you can add the desired output to a DOCX file and inspect it.
- A DOCX is just a zip file. To inspect the underlying XML, you can open the DOCX file in nearly any file compression/archival tool. If your OS or archiver doesn’t recognise it, just change the extension to
.zip
.
- A DOCX is just a zip file. To inspect the underlying XML, you can open the DOCX file in nearly any file compression/archival tool. If your OS or archiver doesn’t recognise it, just change the extension to
- You should search out another resource if you want to make substantial edits to this file, or you can add the desired output to a DOCX file and inspect it.
Cover Page and Author Abuse
- In APA 7 Student Paper, our cover page(s) are fairly simple. A title, author, some course information, and optionally an Abstract.
- Luckily the pandoc template includes Title, Author and Abstract variables already.
- However, it does not provide a page break before/after the Abstract.
- I’ve set the AbstractTitle style to always have a page break before any occurrences of it.
- I also use
\pagebreak
in my markdown to insert page breaks before/after the body.
- However, it does not provide a page break before/after the Abstract.
- As for other cover page information, the only styles/positions available in the default pandoc template (see above) are Title, Subtitle, Author, Date (output in that order)
- For additional information you can use subtitle and date fields, and set a style in your DOCX “Template” to position and style it
- Word style include settings for padding, spacing, borders, page breaks, etc. You can do a surprising amount of layout for your cover page purely in style definitions!
- For my purposes the “course information” is styled as if they were additional authors, so I can just add them to the “author” front matter list.
Extras
Workflow
- See Obsidian and Zotero for details on how use Obsidian to take notes and do literature reviews
- Point Pandoc Reference List at your BibTex file. Entering
@
will show you live citations results from your reference manager. These will render in reading/live preview mode based on a CSL file. By default, hovering over them will display a full reference. Neat!
Extra packages/Alternatives
- I don’t use these but they may be useful avenues for you to investigate for your needs
- pandoc-crossref
- Reference internal sections and figures using citekey style syntax.
- I don’t have a big use case for this due to relatively low amount of cross-referencing/figures and it kept breaking for me.
- Useful if you have a lot of figures, so that adding a new figure not at the end of the document doesn’t mess up numbering.
- pandoc-plots
- Write code blocks in just about any programatic plotting toolkit (e.g.
ggplot
,matlab
), and they will render in the output. - Cool idea, but wouldn’t work for me. Besides, this seems inferior to just making your plot in R, where you can fiddle with it until its right. Here, you have to run the full pandoc export every time you want to see what it looks like, which isn’t super quick and puts Obsidian in a blocked state.
- Write code blocks in just about any programatic plotting toolkit (e.g.
- Writage
- Markdown ←> DOCX plugin for MS Word. Paid & Proprietary software.
- Seems cool — maybe good if you need to do a lot of collaboration/commenting/editing in DOCX/Word.