16 KiB
translating by dianbanjiu How to use Pandoc to produce a research paper
Learn how to manage section references, figures, tables, and more in Markdown.
This article takes a deep dive into how to produce a research paper using (mostly) Markdown syntax. We'll cover how to create and reference sections, figures (in Markdown and LaTeX) and bibliographies. We'll also discuss troublesome cases and why writing them in LaTeX is the right approach.
Research
Research papers usually contain references to sections, figures, tables, and a bibliography. Pandoc by itself cannot easily cross-reference these, but it can leverage the pandoc-crossref filter to do the automatic numbering and cross-referencing of sections, figures, and tables.
Let’s start by rewriting an example of an educational research paper originally written in LaTeX and rewrites it in Markdown (and some LaTeX) with Pandoc and pandoc-crossref.
Adding and referencing sections
Sections are automatically numbered and must be written using the Markdown heading H1. Subsections are written with subheadings H2-H4 (it is uncommon to need more than that). For example, to write a section titled “Implementation”, write # Implementation {#sec:implementation}
, and Pandoc produces 3. Implementation
(or the corresponding numbered section). The title “Implementation” uses heading H1 and declares a label {#sec:implementation}
that authors can use to refer to that section. To reference a section, type the @
symbol followed by the label of the section and enclose it in square brackets: [@sec:implementation]
.
In this paper, we find the following example:
we lack experience (consistency between TAs, [@sec:implementation]).
Pandoc produces:
we lack experience (consistency between TAs, Section 4).
Sections are numbered automatically (this is covered in the Makefile
at the end of the article). To create unnumbered sections, type the title of the section, followed by {-}
. For example, ### Designing a game for maintainability {-}
creates an unnumbered subsection with the title “Designing a game for maintainability”.
Adding and referencing figures
Adding and referencing a figure is similar to referencing a section and adding a Markdown image:
![Scatterplot matrix](data/scatterplots/RScatterplotMatrix2.png){#fig:scatter-matrix}
The line above tells Pandoc that there is a figure with the caption Scatterplot matrix and the path to the image is data/scatterplots/RScatterplotMatrix2.png
. {#fig:scatter-matrix}
declares the name that should be used to reference the figure.
Here is an example of a figure reference from the example paper:
The boxes "Enjoy", "Grade" and "Motivation" ([@fig:scatter-matrix]) ...
Pandoc produces the following output:
The boxes "Enjoy", "Grade" and "Motivation" (Fig. 1) ...
Adding and referencing a bibliography
Most research papers keep references in a BibTeX database file. In this example, this file is named biblio.bib and it contains all the references of the paper. Here is what this file looks like:
@inproceedings{wrigstad2017mastery,
Author = {Wrigstad, Tobias and Castegren, Elias},
Booktitle = {SPLASH-E},
Title = {Mastery Learning-Like Teaching with Achievements},
Year = 2017
}
@inproceedings{review-gamification-framework,
Author = {A. Mora and D. Riera and C. Gonzalez and J. Arnedo-Moreno},
Publisher = {IEEE},
Booktitle = {2015 7th International Conference on Games and Virtual Worlds
for Serious Applications (VS-Games)},
Doi = {10.1109/VS-GAMES.2015.7295760},
Keywords = {formal specification;serious games (computing);design
framework;formal design process;game components;game design
elements;gamification design frameworks;gamification-based
solutions;Bibliographies;Context;Design
methodology;Ethics;Games;Proposals},
Month = {Sept},
Pages = {1-8},
Title = {A Literature Review of Gamification Design Frameworks},
Year = 2015,
Bdsk-Url-1 = {http://dx.doi.org/10.1109/VS-GAMES.2015.7295760}
}
...
The first line, @inproceedings{wrigstad2017mastery,
, declares the type of publication (inproceedings
) and the label used to refer to that paper (wrigstad2017mastery
).
To cite the paper with its title, Mastery Learning-Like Teaching with Achievements, type:
the achievement-driven learning methodology [@wrigstad2017mastery]
Pandoc will output:
the achievement- driven learning methodology [30]
The paper we will produce includes a bibliography section with numbered references like these:
Citing a collection of articles is easy: Simply cite each article, separating the labeled references using a semi-colon: ;
. If there are two labeled references—i.e., SEABORN201514
and gamification-leaderboard-benefits
—cite them together, like this:
Thus, the most important benefit is its potential to increase students' motivation
and engagement [@SEABORN201514;@gamification-leaderboard-benefits].
Pandoc will produce:
Thus, the most important benefit is its potential to increase students’ motivation
and engagement [26, 28]
Problematic cases
A common problem involves objects that do not fit in the page. They then float to wherever they fit best, even if that position is not where the reader expects to see it. Since papers are easier to read when figures or tables appear close to where they are mentioned, we need to have some control over where these elements are placed. For this reason, I recommend the use of the figure
LaTeX environment, which enables users to control the positioning of figures.
Let’s take the figure example shown above:
![Scatterplot matrix](data/scatterplots/RScatterplotMatrix2.png){#fig:scatter-matrix}
And rewrite it in LaTeX:
\begin{figure}[t]
\includegraphics{data/scatterplots/RScatterplotMatrix2.png}
\caption{\label{fig:matrix}Scatterplot matrix}
\end{figure}
In LaTeX, the [t]
option in the figure
environment declares that the image should be placed at the top of the page. For more options, refer to the Wikibooks article LaTex/Floats, Figures, and Captions.
Producing the paper
So far, we've covered how to add and reference (sub-)sections and figures and cite the bibliography—now let's review how to produce the research paper in PDF format. To generate the PDF, we will use Pandoc to generate a LaTeX file that can be compiled to the final PDF. We will also discuss how to generate the research paper in LaTeX using a customized template and a meta-information file, and how to compile the LaTeX document into its final PDF form.
Most conferences provide a .cls file or a template that specifies how papers should look; for example, whether they should use a two-column format and other design treatments. In our example, the conference provided a file named acmart.cls.
Authors are generally expected to include the institution to which they belong in their papers. However, this option was not included in the default Pandoc’s LaTeX template (note that the Pandoc template can be inspected by typing pandoc -D latex
). To include the affiliation, take the default Pandoc’s LaTeX template and add a new field. The Pandoc template was copied into a file named mytemplate.tex
as follows:
pandoc -D latex > mytemplate.tex
The default template contains the following code:
$if(author)$
\author{$for(author)$$author$$sep$ \and $endfor$}
$endif$
$if(institute)$
\providecommand{\institute}[1]{}
\institute{$for(institute)$$institute$$sep$ \and $endfor$}
$endif$
Because the template should include the author’s affiliation and email address, among other things, we updated it to include these fields (we made other changes as well but did not include them here due to the file length):
latex
$for(author)$
$if(author.name)$
\author{$author.name$}
$if(author.affiliation)$
\affiliation{\institution{$author.affiliation$}}
$endif$
$if(author.email)$
\email{$author.email$}
$endif$
$else$
$author$
$endif$
$endfor$
With these changes in place, we should have the following files:
main.md
contains the research paperbiblio.bib
contains the bibliographic databaseacmart.cls
is the class of the document that we should usemytemplate.tex
is the template file to use (instead of the default)
Let’s add the meta-information of the paper in a meta.yaml
file:
---
template: 'mytemplate.tex'
documentclass: acmart
classoption: sigconf
title: The impact of opt-in gamification on `\\`{=latex} students' grades in a software design course
author:
- name: Kiko Fernandez-Reyes
affiliation: Uppsala University
email: kiko.fernandez@it.uu.se
- name: Dave Clarke
affiliation: Uppsala University
email: dave.clarke@it.uu.se
- name: Janina Hornbach
affiliation: Uppsala University
email: janina.hornbach@fek.uu.se
bibliography: biblio.bib
abstract: |
An achievement-driven methodology strives to give students more control over their learning with enough flexibility to engage them in deeper learning. (more stuff continues)
include-before: |
\```{=latex}
\copyrightyear{2018}
\acmYear{2018}
\setcopyright{acmlicensed}
\acmConference[MODELS '18 Companion]{ACM/IEEE 21th International Conference on Model Driven Engineering Languages and Systems}{October 14--19, 2018}{Copenhagen, Denmark}
\acmBooktitle{ACM/IEEE 21th International Conference on Model Driven Engineering Languages and Systems (MODELS '18 Companion), October 14--19, 2018, Copenhagen, Denmark}
\acmPrice{XX.XX}
\acmDOI{10.1145/3270112.3270118}
\acmISBN{978-1-4503-5965-8/18/10}
\begin{CCSXML}
<ccs2012>
<concept>
<concept_id>10010405.10010489</concept_id>
<concept_desc>Applied computing~Education</concept_desc>
<concept_significance>500</concept_significance>
</concept>
</ccs2012>
\end{CCSXML}
\ccsdesc[500]{Applied computing~Education}
\keywords{gamification, education, software design, UML}
\```
figPrefix:
- "Fig."
- "Figs."
secPrefix:
- "Section"
- "Sections"
...
This meta-information file sets the following variables in LaTeX:
template
refers to the template to use (‘mytemplate.tex’)documentclass
refers to the LaTeX document class to use (acmart
)classoption
refers to the options of the class, in this casesigconf
title
specifies the title of the paperauthor
is an object that contains other fields, such asname
,affiliation
, andemail
.bibliography
refers to the file that contains the bibliography (biblio.bib)abstract
contains the abstract of the paperinclude-before
is information that should be included before the actual content of the paper; this is known as the preamble in LaTeX. I have included it here to show how to generate a computer science paper, but you may choose to skip itfigPrefix
specifies how to refer to figures in the document, i.e., what should be displayed when one refers to the figure[@fig:scatter-matrix]
. For example, the currentfigPrefix
produces in the exampleThe boxes "Enjoy", "Grade" and "Motivation" ([@fig:scatter-matrix])
this output:The boxes "Enjoy", "Grade" and "Motivation" (Fig. 3)
. If there are multiple figures, the current setup declares that it should instead displayFigs.
next to the figure numbers.secPrefix
specifies how to refer to sections mentioned elsewhere in the document (similar to figures, described above)
Now that the meta-information is set, let’s create a Makefile
that produces the desired output. This Makefile
uses Pandoc to produce the LaTeX file, pandoc-crossref
to produce the cross-references, pdflatex
to compile the LaTeX to PDF, and bibtex
to process the references.
The Makefile
is shown below:
all: paper
paper:
@pandoc -s -F pandoc-crossref --natbib meta.yaml --template=mytemplate.tex -N \
-f markdown -t latex+raw_tex+tex_math_dollars+citations -o main.tex main.md
@pdflatex main.tex &> /dev/null
@bibtex main &> /dev/null
@pdflatex main.tex &> /dev/null
@pdflatex main.tex &> /dev/null
clean:
rm main.aux main.tex main.log main.bbl main.blg main.out
.PHONY: all clean paper
Pandoc uses the following flags:
-s
to create a standalone LaTeX document-F pandoc-crossref
to make use of the filterpandoc-crossref
--natbib
to render the bibliography withnatbib
(you can also choose--biblatex
)--template
sets the template file to use-N
to number the section headings-f
and-t
specify the conversion from and to which format.-t
usually contains the format and is followed by the Pandoc extensions used. In the example, we declaredraw_tex+tex_math_dollars+citations
to allow use ofraw_tex
LaTeX in the middle of the Markdown file.tex_math_dollars
enables us to type math formulas as in LaTeX, andcitations
enables us to use this extension.
To generate a PDF from LaTeX, follow the guidelines from bibtex to process the bibliography:
@pdflatex main.tex &> /dev/null
@bibtex main &> /dev/null
@pdflatex main.tex &> /dev/null
@pdflatex main.tex &> /dev/null
The script contains @
to ignore the output, and we redirect the file handle of the standard output and error to /dev/null
so that we don’t see the output generated from the execution of these commands.
The final result is shown below. The repository for the article can be found on GitHub:
Conclusion
In my opinion, research is all about collaboration, dissemination of ideas, and improving the state of the art in whatever field one happens to be in. Most computer scientists and engineers write papers using the LaTeX document system, which provides excellent support for math. Researchers from the social sciences seem to stick to DOCX documents.
When researchers from different communities write papers together, they should first discuss which format they will use. While DOCX may not be convenient for engineers if there is math involved, LaTeX may be troublesome for researchers who lack a programming background. As this article shows, Markdown is an easy-to-use language that can be used by both engineers and social scientists.
via: https://opensource.com/article/18/9/pandoc-research-paper
作者:Kiko Fernandez-Reyes 选题:lujun9972 译者:译者ID 校对:校对者ID