2.4.4 - Directory Tree
For relatively small websites, ad-hoc side-loading is available directly from a folder structure on the hard drive. This is intended for loading manuals, documentation and similar data sets that are large and slowly changing.
A website can be archived with wget, like this
wget -nc -x --continue -w 1 -r -A "html" "docs.marginalia.nu"
After doing this to a bunch of websites, create a YAML file in the upload directory, with contents something like this:
sources:
- name: jdk-20
dir: "jdk-20/"
domainName: "docs.oracle.com"
baseUrl: "https://docs.oracle.com/en/java/javase/20/docs"
keywords:
- "java"
- "docs"
- "documentation"
- "javadoc"
- name: python3
dir: "python-3.11.5/"
domainName: "docs.python.org"
baseUrl: "https://docs.python.org/3/"
keywords:
- "python"
- "docs"
- "documentation"
- name: mariadb.com
dir: "mariadb.com/"
domainName: "mariadb.com"
baseUrl: "https://mariadb.com/"
keywords:
- "sql"
- "docs"
- "mariadb"
- "mysql"
The fields in the above are
parameter | description |
---|---|
name | Purely informative |
dir | Path of website contents relative to the location of the yaml file |
domainName | The domain name of the website |
baseUrl | This URL will be prefixed to the contents of dir |
keywords | These supplemental keywords will be injected in each document |
The directory structure corresponding to the above might look like this:
docs-index.yaml
jdk-20/
jdk-20/resources/
jdk-20/api/
jdk-20/api/[...]
jdk-20/specs/
jdk-20/specs/[...]
jdk-20/index.html
mariadb.com
mariadb.com/kb/
mariadb.com/kb/[...]
python-3.11.5
python-3.11.5/genindex-B.html
python-3.11.5/library/
python-3.11.5/distutils/
python-3.11.5/[...]
[...]
So e.g. the file jdk-20/api/java.base/java/lang/Thread.html
would refer to the URL
https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Thread.html
.
After creating the directory structure, go to the Index Nodes -> Node N -> Actions -> Sideload Dirtree
,
and select the directory containing the structure. Then click ‘Sideload Dirtree`. This will process the
data, so that it can be loaded.
As usual with sideloaded data, after the data has been processed it can be loaded by going to Index Nodes -> Node N -> Actions -> Load Processed Data
.