Difference between revisions of "Article Format"

From openZIM
Jump to navigation Jump to search
(Adapt article description format to new namespace usage.)
 
Line 1: Line 1:
== Article Text ==
All content put in a zim archive must be put in the <code>C</code> namespace.
Articles to be parsed and shown directly by the ZIM reader are stored as HTML body, without any layout except the formattings used in the article text (headlines, tables, images...).


Shortly, the A namespace contains the ''visible'' data of an article.
Entries can be :
* Articles : Html content intended to be displayed to the user
* Resources : Any other kind of file, mainly intended to be included in articles or other resources (css, images, js, ...)


* '''Namespace:''' A
== Article Entries ==
* '''Path:''' /A/''URL''
Article's contents are full html pages. (including any <code><head></code> tag or any css, scripts and fonts links).
** whereas ''URL'' is often identical to the ''Article Name'', but this is not a requirement (see [[ZIM File Format#URL Pointer List (urlPtrPos)|URL pointer list]] and [[ZIM File Format#Title Pointer List (titlePtrPos)|title pointer list]] for details).


== Meta Data ==
Article's content must be utf-8 encoded.
Some publisher want to provide additional header information for the reader application to individual articles, such as HTML Meta Data or a special layout around the article text.


Shortly, the B namespace contains the ''invisible'' part of an article.
ZIM contents are addressed using the entry's path without the namespace. The references in articles HTML code (<code><a href=""></a></code>, <code><img src=""></code>, etc.) must be valid and usable by a classical web browser (ie, URL-encoded following the [http://www.ietf.org/rfc/rfc1738.txt RFC 1738] rules).


By default the Meta Data can be non-existant or empty.
Absolute URLs, ie. with a leading slash (''/''), are forbidden, because this avoid including the ZIM contents in any HTTP sub-hierachy. URLs must consequently be relative.


Typically the article text and article meta data are linked to each other by having the same URL.
URLs with namespace (<code>C/foo.html</code>) are also forbidden as the namespace may be hidden by the libzim. URLs must not go "too up" in the directory hierarchy (<code>../C/bar.png</code>). <code>../</code> is still possible if the entries is in a sub-directory.
 
== Resources Entries ==
* '''Namespace:''' B
There is no strong constraints on resources entries :
* '''Path:''' /B/''URL''
* Mimetype must be correctly set.
** whereas /B/''URL'' is the Meta Data used for /A/''URL''.
* We advice textual contents to be utf-8 encoded but we cannot enforce it, it depends of the resource and how it is used.
 
=== Content Inclusion ===
The Article Text needds to be combined with Article Meta Data, therefore the Meta Data needs to define a placeholder where the Article Text has to be inserted.
 
== Fetching Article Text vs. Article Meta Data ==
Links inside articles always use the A namespace to refer to other articles, so the zimlib does provide Article Text by default for any requests of namespace A.
 
To request the pure article data from namespace A use the <tt>getData()</tt> method in zimlib.
 
To get an article included inside the layoutpage and with meta data use the <tt>getPage()</tt> method in zimlib.

Latest revision as of 11:35, 15 December 2020

All content put in a zim archive must be put in the C namespace.

Entries can be :

  • Articles : Html content intended to be displayed to the user
  • Resources : Any other kind of file, mainly intended to be included in articles or other resources (css, images, js, ...)

Article Entries

Article's contents are full html pages. (including any <head> tag or any css, scripts and fonts links).

Article's content must be utf-8 encoded.

ZIM contents are addressed using the entry's path without the namespace. The references in articles HTML code (<a href=""></a>, <img src="">, etc.) must be valid and usable by a classical web browser (ie, URL-encoded following the RFC 1738 rules).

Absolute URLs, ie. with a leading slash (/), are forbidden, because this avoid including the ZIM contents in any HTTP sub-hierachy. URLs must consequently be relative.

URLs with namespace (C/foo.html) are also forbidden as the namespace may be hidden by the libzim. URLs must not go "too up" in the directory hierarchy (../C/bar.png). ../ is still possible if the entries is in a sub-directory.

Resources Entries

There is no strong constraints on resources entries :

  • Mimetype must be correctly set.
  • We advice textual contents to be utf-8 encoded but we cannot enforce it, it depends of the resource and how it is used.