aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md578
-rw-r--r--util/weaver.lua34
2 files changed, 612 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..28a9eed
--- /dev/null
+++ b/README.md
@@ -0,0 +1,578 @@
1---
2title: "Pangler: literate programming in Pandoc"
3author: Federico Igne
4date: \today
5---
6
7This documents describes the logic and design of `pangler`, a minimal tangler for literate programming using the [Pandoc Markdown syntax](https://pandoc.org/MANUAL.html#pandocs-markdown).
8
9[Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) (LP) is a programming paradigm that emphasize the natural flow of thoughts that the programmer experiences when writing software.
10The paradigm can be seen as "documentation first" and the focus is on "human-to-human" communication.
11The produced document is a text-based prose document describing the logic and the design of the program, interspersed with snippets of code that form the final software.
12
13Given an LP document, one can either extract the *tangled* code (with a "tangler") or generate its documentation, "woven" from the literate input source (with a "weaver").
14
15In this case, [Pandoc](https://pandoc.org) is a very good weaver, supporting the generation of different document formats from a Markdown source.
16This document is an attempt at providing a tangler working alongside Pandoc.
17`pangler` is itself written in Pandoc Markdown format and can be generated from this document using itself.
18
19# Literate programming with `pangler`
20
21`pangler` uses two main features provided by the Pandoc Markdown syntax, which are not necessarily present in other Markdown flavours:
22
231. [`backtick_code_blocks`](https://pandoc.org/MANUAL.html#extension-backtick_code_blocks) for writing fenced snippets of code, and
242. [`fenced_code_attributes`](https://pandoc.org/MANUAL.html#extension-fenced_code_attributes) for adding arbitrary HTML attributes, classes and ID to a snippet of code.
25
26## Writing programs
27
28In the following, we indicate a *literate program* as a markdown file written in Pandoc Markdown syntax.
29
30A minimal block of code, recognized by `pangler`, with ID `identifier` is
31
32~~~
33```{#identifier}
34[code snippet]
35```
36~~~
37
38Code blocks can contain **code macros** of the form `<<identifier>>` where `identifier` is a valid code block ID.
39Code macros will be recursively substituted by the corresponding code snippet during [code generation][Tangling: generating the source files].
40A code macro needs to be placed in its own line, with an optional (whitespace) indentation, used during code generation to indent the code snippet.
41
42Additional attributes and classes can be added to a code block, as well;
43the language of the code snippet can be provided and is useful to enable correct syntax highlighting.
44
45~~~
46```{#identifier .python}
47[python code snippet]
48```
49~~~
50
51An identifier can also be a file name matching the following regex
52
53```{#regex_path .rust}
54static ref PATH: Regex =
55 Regex::new(
56 r"^(?:[[:word:]\.-]+/)*[[:word:]\.-]+\.[[:alpha:]]+$"
57 ).unwrap();
58```
59
60In that case the code block is considered a valid **entry point** for the generation of a file with that name.
61The code block defines the content of the new file.
62
63~~~
64```{#file.py .python}
65[python main file]
66```
67~~~
68
69File names can be generated in subfolders using the `path` attribute.
70The following code block determines the content of file `path/to/file.py`.
71
72~~~
73```{#file.py .python path="path/to/"}
74[python main file]
75```
76~~~
77
78This path is relative to the current working directory, unless [the `-o`/`--output` flag is used][Command Line Interface].
79
80Code blocks without an ID are ignored.
81
82```{#code_block_gathering .rust}
83if !id.is_empty() {
84 let key = {
85 let path = attrs.iter().find(|(k,_)| k == "path");
86 if let Some(path) = path {
87 format!("{}{}", path.1, id)
88 } else {
89 id.to_string()
90 }
91 };
92 <<code_block>>
93} else {
94 eprintln!("Ignoring code block without ID:");
95 eprintln!("{}", indent(Cow::from(code),4));
96}
97```
98
99Code blocks are processed in order.
100By default, if an identifier is already defined, the code block is appended to the current corresponding value.
101
102Use the `override` class in the code block definition to cause the block to override the previous entry with the same key, if this exists.
103
104~~~
105```{#identifier .python .override}
106[Python code snippet]
107```
108~~~
109
110This is handled in code as follows
111
112```{#code_block .rust}
113if clss.iter().any(|c| c == "override") {
114 blocks.insert(key, Cow::from(code));
115} else {
116 blocks.entry(key)
117 .and_modify(|s| {
118 *s += "\n";
119 *s += Cow::from(code)
120 })
121 .or_insert(Cow::from(code));
122}
123```
124
125## Tangling: generating the source files
126
127To bootstrap the tangling process, an early version of `pangler` is provided under `bin/` in this repository.
128
129You can generate the code for the current version of the program, in the current working directory, with
130
131```sh
132./bin/pangler-v0.1.0 README.md
133```
134
135and compile it with
136
137```sh
138cargo build --release
139```
140
141From now on you can make changes to the `README.md` file and use the latest version of `pangler` to tangle and compile it.
142
143## Weaving: generating the documentation
144
145As explained above we use [`pandoc`](https://pandoc.org/) as a weaver.
146Run the following command to generate a PDF file for this document
147
148```sh
149pandoc --to latex \
150 --listings \
151 --number-sections \
152 --lua-filter=util/weaver.lua \
153 --output pangler.pdf \
154 README.md
155```
156
157The Lua filter `util/weaver.lua` is provided to handle custom `pangler` attributes during the PDF generation via the \LaTeX\ engine.
158
159## Integration with (Neo)Vim
160
161(Neo)Vim supports code highlighting inside Markdown blocks, when the programming language is provided among its attributes.
162Add the following to your config file to enable code highlighting for a specific set of languages
163
164```vimscript
165let g:markdown_fenced_languages =
166 ['python','rust','scala']
167```
168
169# Command Line Interface
170
171`pangler` offers a very simple command line interface.
172For an overview of the functionalities offered by the tool run
173
174```sh
175pangler --help
176```
177
178`pangler` uses the `clap` library to parse command line arguments
179
180```{#dependencies .toml}
181clap = { version = "3.1", features = ["derive"] }
182```
183
184```{#uses .rust}
185use clap::Parser;
186```
187
188using the [Derive API](https://github.com/clap-rs/clap/blob/v3.1.18/examples/tutorial_derive/README.md) to define the exposed functionalities.
189The `struct` holding the CLI information is defined as follow
190
191```{#config .rust}
192/// A tangler for Literate Programming in Pandoc
193#[derive(Parser, Debug)]
194#[clap(author, version, about, long_about = None)]
195struct Config {
196 <<config_depth>>
197 <<config_output>>
198 <<config_input>>
199}
200```
201
202and the arguments are parsed as
203
204```{#config_parse .rust}
205let config = Config::parse();
206```
207
208`pangler` accepts a sequence of files that will be parsed, code will be collected and used to build the final program.
209Note that the order of the file provided on the CLI is important when using the [overriding functionality][Writing programs].
210
211```{#config_input .rust}
212/// Input files
213input: Vec<PathBuf>,
214```
215
216By default, files are generated in the current working directory.
217
218```{#constants .rust}
219const BASE: &str = "./";
220```
221
222This behaviour can be overridden using the `-o`/`--output` flag.
223
224```{#config_output .rust}
225/// Base output directory [default: './']
226#[clap(short, long)]
227output: Option<PathBuf>,
228```
229
230Finally, recursive substitution of blocks can lead to an infinite loop.
231By default, `pangler` will stop after 10 substitution iterations, but this parameter can be changed with the `-d`/`--depth` flag.
232
233```{#config_depth .rust}
234/// Maximum substitution depth
235#[clap(short, long, default_value_t = 10)]
236depth: u32,
237```
238
239# The program
240
241The program is structured as a single Rust file with the following being the main entry point of the program
242
243```{#main.rs .rust path="src/"}
244<<uses>>
245
246<<constants>>
247
248<<config>>
249
250<<types>>
251
252<<functions>>
253
254fn main() -> Result<()> {
255 <<config_parse>>
256 <<pandoc_setup>>
257 Ok(())
258}
259```
260
261## Pandoc
262
263We are using [`rust-pandoc`](https://github.com/oli-obk/rust-pandoc) and [`pandoc-ast`](https://github.com/oli-obk/pandoc-ast) to interact with `pandoc` from Rust.
264
265```{#dependencies .toml}
266pandoc = "0.8"
267pandoc_ast = "0.8"
268```
269
270```{#uses .rust}
271use pandoc::{
272 InputFormat,InputKind,OutputFormat,OutputKind,Pandoc
273};
274use pandoc_ast::Block;
275```
276
277First we need to initialize a new `Pandoc` struct
278
279```{#pandoc_setup .rust}
280let mut pandoc = Pandoc::new();
281```
282
283and set up the input parameters.
284The input is a sequence of Markdown files passed as config options from the CLI.
285
286```{#pandoc_setup .rust}
287pandoc.set_input(InputKind::Files(config.input));
288pandoc.set_input_format(InputFormat::Markdown, vec![]);
289```
290
291The output is piped to stdout in JSON format.
292
293```{#pandoc_setup .rust}
294pandoc.set_output(OutputKind::Pipe);
295pandoc.set_output_format(OutputFormat::Json, vec![]);
296```
297
298In this way, we will be able to pipe the output into a Pandoc filter that will collect the code snippets and build the codebase for us.
299
300```{#pandoc_setup .rust}
301pandoc.add_filter(
302 move |json| pandoc_ast::filter(json,
303 |pandoc| {
304 <<pandoc_filter>>
305 }
306 )
307);
308pandoc.execute().unwrap();
309```
310
311## Pandoc filters
312
313Pandoc allows for the definition of [custom filters](https://pandoc.org/filters.html) to change the abstract syntax tree of a document.
314
315In this case we use a filter to collect code snippets from the input Markdown text into a `HashMap`, mapping code block identifiers to code block snippets.
316
317```{#uses .rust}
318use std::borrow::Cow;
319use std::collections::HashMap;
320```
321
322```{#types .rust}
323type Blocks<'a> = HashMap<String,Cow<'a,str>>;
324```
325
326Code blocks are wrapped into a [`Cow`](https://doc.rust-lang.org/stable/std/borrow/enum.Cow.html), i.e., a "copy-on-write" smart pointer, to avoid string duplication, unless strictly necessary.
327
328We iterate over all code blocks, along with their IDs, classes and attributes, collecting them
329
330```{#pandoc_filter .rust}
331let mut blocks: Blocks = HashMap::new();
332pandoc.blocks.iter().for_each(|block|
333 if let Block::CodeBlock((id,clss,attrs), code) = block {
334 <<code_block_gathering>>
335 }
336);
337```
338
339And then we build the source code, making sure to cut off recursive code generation with depth larger than `config.depth`.
340
341```{#pandoc_filter .rust}
342build(&config.output, &blocks, config.depth);
343```
344
345The filter returns the Pandoc JSON unchanged.
346
347```{#pandoc_filter .rust}
348pandoc
349```
350
351## Source code generation
352
353In order to build the source code from the gathered code block snippets, we need to recursively substitute *code macros* of the form `<<identifier>>` with the corresponding code block.
354
355Code macros are matched with the following regex
356
357```{#regex_macro .rust}
358static ref MACRO: Regex =
359 Regex::new(
360 r"(?m)^([[:blank:]]*)<<([^>\s]+)>>"
361 ).unwrap();
362```
363
364Note that, when matching the code macro, we keep track of its indentation as well, in order to properly indend code.
365
366Given a code macro, the following closure will compute the substituting block of code, properly indented.
367The input `Captures` structure is a vector with the regex capture groups, i.e., indentation and macro identifier, along with the full match in the first position.
368
369In case we reach the maximum allowed depth we truncate code block substitution and notify the user that something might not have been generated as expected.
370
371```{#macro_closure .rust}
372|caps: &Captures| {
373 if current_depth < max_depth {
374 let block = blocks
375 .get(&caps[2])
376 .expect("Block not present")
377 .clone();
378 indent(block, caps[1].len())
379 } else {
380 eprintln!("Reached maximum depth, \
381 output might be truncated.\n\
382 Increase `--depth` accordingly.");
383 Cow::Owned(String::from(""))
384 }
385}
386```
387
388As explained above, the building process iterates over all collected blocks and detects relevant entry points (files to generate) to start the recursive macro substitution.
389
390```{#functions .rust}
391fn build(
392 base: &Option<PathBuf>,
393 blocks: &Blocks,
394 max_depth: u32
395) {
396 <<regex_definition>>
397 blocks
398 .iter()
399 .for_each(|(path,code)| if PATH.is_match(path) {
400 <<code_generation>>
401 })
402}
403
404```
405
406### Recursive macro substitution
407
408The code generating algorithm went through multiple iterations and showed some interesting details of using `Cow`s.
409
410```{#code_generation .rust}
411let mut current_depth = 0;
412let mut code = code.clone();
413while MACRO.is_match(&code) {
414 code = MACRO.replace_all(
415 &code,
416 <<macro_closure>>
417 );
418 current_depth += 1;
419}
420```
421
422The problem with this version is that, due to how `Cow` works, the value returned by `replace_all` cannot live longer than the borrowed `code` passed as a parameter.
423This is because the function returns a reference to `code` (`Cow::Borrowed`) if no replacement takes place, so for the returned value to be valid, `code` still needs to be available.
424But here, `code` gets overridden right away, so, in principle, if no replacement takes place `code` gets overridden by a reference to itself (losing data).
425
426However, note that this doesn't happen in practice (but the compiler doesn't know about this) because the `replace_all` function is applied as long as some replacement is possible (`while`
427condition).
428In other words, all calls to `replace_all` always return an `Cow::Owned` value.
429
430The problem is solved by a clever use of pattern matching
431
432```{#code_generation .rust .override}
433let mut current_depth = 0;
434let mut code = code.clone();
435while let Cow::Owned(new_code) = MACRO.replace_all(
436 &code,
437 <<macro_closure>>
438) {
439 code = Cow::from(new_code);
440 current_depth += 1;
441}
442```
443
444In this case, the matched `Cow::Owned` is not concerned by any lifetime (the type is `Cow<'_,str>`) of the borrowed value `code`.
445Moreover `code` takes ownership of `new_code: String` using the `Cow::from()` function.
446No heap allocation is performed, and the string is not copied.
447
448Finally, we write the code to file
449
450```{#code_generation .rust}
451let file = base
452 .clone()
453 .unwrap_or(PathBuf::from(BASE))
454 .join(path);
455write_to_file(file, &code)
456 .expect("Unable to write to file");
457```
458
459## Additional details
460
461### Code indentation
462
463When (positive) code indentation is required, the processed block of code is indented by `indent`.
464
465```{#indent_prefix .rust}
466let prefix = format!("{:indent$}", "");
467```
468
469Each line is then `prefix`ed separately and the result is returned.
470
471```{#functions .rust}
472fn indent<'a>(
473 input: Cow<'a,str>,
474 indent: usize
475) -> Cow<'a,str> {
476 if indent > 0 {
477 <<indent_prefix>>
478 let size = input.len() + indent*input.lines().count();
479 let mut output = String::with_capacity(size);
480 input.lines().enumerate().for_each(|(i,line)| {
481 if i > 0 {
482 output.push('\n');
483 }
484 if !line.is_empty() {
485 output.push_str(&prefix);
486 output.push_str(line);
487 }
488 });
489 Cow::Owned(output)
490 } else {
491 input
492 }
493}
494
495```
496
497Note that, if no indentation is required (i.e., `indent` is equal to 0), no additional allocation is performed, and the `input` is returned as is.
498
499### RegEx matching
500
501`pangler` uses the `regex` library to perform regular expression matching and substitution.
502Moreover, the library suggests the use of `lazy_static` to ensure that the regexes used are compiled exactly once per execution.
503
504
505```{#dependencies .toml}
506lazy_static = "1.4"
507regex = "1.5"
508```
509
510```{#uses .rust}
511use lazy_static::lazy_static;
512use regex::{Captures,Regex};
513```
514
515We wrap the regex definition in a `lazy_static` macro
516
517```{#regex_definition .rust}
518lazy_static! {
519 <<regex_path>>
520 <<regex_macro>>
521}
522```
523
524### Writing to file
525
526Writing to file is an operation performed using the Rust support for OS operations from the standard library.
527
528```{#uses .rust}
529use std::fs;
530use std::io::Result;
531use std::path::PathBuf;
532```
533
534First, all necessary parent directories of `path` are created
535
536```{#parent_directory_creation .rust}
537fs::create_dir_all(path.parent().unwrap())?;
538```
539
540and then the `content` is written to the file provided by
541
542```{#write_to_file .rust}
543fs::write(path, content)?;
544```
545
546We perform a check on `path` and only write the content to the file if the path is relative to the current working directory.
547
548```{#functions .rust}
549fn write_to_file(
550 path: PathBuf, content: &str
551) -> std::io::Result<()> {
552 if path.is_relative() {
553 <<parent_directory_creation>>
554 <<write_to_file>>
555 } else {
556 eprintln!(
557 "Absolute paths not supported: {}",
558 path.to_string_lossy()
559 )
560 }
561 Ok(())
562}
563
564```
565
566# Credits
567
568`pangler v0.2.0` was created by Federico Igne (git@federicoigne.com) and available at [`https://git.dyamon.me/projects/pangler`](https://git.dyamon.me/projects/pangler).
569
570```{#Cargo.toml .toml}
571[package]
572name = "pangler"
573version = "0.2.0"
574edition = "2021"
575
576[dependencies]
577<<dependencies>>
578```
diff --git a/util/weaver.lua b/util/weaver.lua
new file mode 100644
index 0000000..1159988
--- /dev/null
+++ b/util/weaver.lua
@@ -0,0 +1,34 @@
1if FORMAT:match 'latex' then
2 -- Setting custom `listings` style
3 function Meta(m)
4 m["header-includes"] = pandoc.MetaBlocks({pandoc.RawBlock("latex",[[
5 \lstdefinestyle{weaver}{
6 basicstyle=\small\ttfamily,
7 backgroundcolor=\color{gray!10},
8 xleftmargin=0.5cm,
9 numbers=left,
10 numbersep=5pt,
11 numberstyle=\tiny\color{gray},
12 captionpos=b
13 }
14 \lstset{style=weaver}
15 ]])})
16 return m
17 end
18 function CodeBlock(b)
19 -- Remove `path` attribute and merge it with `id`
20 if b.attributes.path and b.identifier then
21 b.identifier = b.attributes.path .. b.identifier
22 b.attributes.path = nil
23 end
24 -- Add ID to caption
25 if b.identifier then
26 if b.attributes.caption then
27 b.attributes.caption = b.identifier .. ": " .. b.attributes.caption
28 else
29 b.attributes.caption = b.identifier
30 end
31 end
32 return b
33 end
34end