Difference between revisions of "Build your ZIM file"

From openZIM
Jump to: navigation, search
(MWoffliner)
 
(47 intermediate revisions by 9 users not shown)
Line 1: Line 1:
A few tools allow people to create ZIM files.
+
[[File:Wikipedia-Book-creator.png|right|thumb|The ''[http://en.wikipedia.org/wiki/Special:Book Wikipedia Book Creator]'' is the easiest way to create custom ZIM files from Wikipedia]]
 +
A few ''tools allow people'' to create ZIM files.
  
 
== Users ==
 
== Users ==
You can create on Wikipedia and all other Wikimedia projects, ZIM files of article selections. [https://en.wikipedia.org/wiki/Special:Book For example on http://en,wikipedia.org]. This is based on the MediaWiki Collection Extension (see below).
 
  
== Developers ==
+
=== MediaWiki Collection Extension ===
[[Image:Schema ZIM File Creation.png|thumb|right|An example approach to create a ZIM file]]
+
 
 +
''Unfortunately this feature was removed. Please subscribe and show your interest on [https://phabricator.wikimedia.org/T73660 this ticket] if you want to see it back].''
  
=== MediaWiki Collection Extension ===
+
<s>
The Collection extension for MediaWiki provides the ability to select articles and export them in different formats, such as PDF, ODF and ZIM. The main purpose of Collection extension is to create printed books - instead of export you can also choose to have the selected articles printed on demand as a real book.
+
You can create on Wikipedia and all other Wikimedia projects, ZIM files of article selections. [https://en.wikipedia.org/wiki/Special:Book For example on http://en.wikipedia.org]. This is based on the MediaWiki Collection Extension. The Collection extension for MediaWiki provides the ability to select articles and export them in different formats, such as PDF, ODF and ZIM. The main purpose of Collection extension is to create printed books - instead of export you can also choose to have the selected articles printed on demand as a real book.
  
 
The Collection extension can be easily added to any MediaWiki installation:
 
The Collection extension can be easily added to any MediaWiki installation:
 
* [http://www.mediawiki.org/wiki/Extension:Collection Get the extension]
 
* [http://www.mediawiki.org/wiki/Extension:Collection Get the extension]
 
* [http://www.mediawiki.org/wiki/Extension:Collection/openZIM Details on Collection and openZIM]
 
* [http://www.mediawiki.org/wiki/Extension:Collection/openZIM Details on Collection and openZIM]
 +
</s>
 +
 +
=== Ask for a custom ZIM file ===
 +
 +
If you have a deployment project (so, this file is not only for you own personal purpose), you may ask the Kiwix team to create a ZIM file for you. To do that you need to prepare the following information:
 +
* Text file with the list of article titles (one title per line, with underscores, UTF8)
 +
* The URL of the wiki you want to snapshot
 +
* Prepare a welcome page for the ZIM file (on the wiki, and share with us the title of that page)
 +
* ZIM file metadata:
 +
** Title
 +
** Description (only a few words)
 +
** 48x48 PNG logo
  
=== zimwriterdb ===
+
=== Create a ZIM file from existing HTML contents ===
[[zimwriterdb]] is part of the openZIM project. This binary uses a pre-filled Postgres Database and create the corresponding ZIM file. Only buildZimFileFromDirectory.pl (see below) is for now able to fill the database.
+
See [[Zimwriterfs instructions]] for an overview and read the section below on zimwriterfs for some additional context.
 +
 
 +
== Developers ==
 +
[[Image:Schema ZIM File Creation.png|thumb|right|An example approach to create a ZIM file]]
 +
 
 +
=== MWoffliner ===
 +
 
 +
MWoffliner is a tool which allows to "dump" a Wikimedia project (Wikipedia, Wiktionary, ...) to a local storage. It should also work for any [https://mediawiki.org Mediawiki] instance. It goes through all articles (or a selection if specified) of the project and write HTML/pictures to your local filesystem as plain HTML/JS/CSS/... files or in a ZIM file.
 +
 
 +
It is distributed via [https://www.npmjs.com/package/mwoffliner npm] and [https://hub.docker.com/r/openzim/mwoffliner Docker].
 +
 
 +
If you are a developer, you can download it directly from its [https://github.com/openzim/mwoffliner git repository].
 +
 
 +
=== zimwriterfs ===
 +
zimwriterfs is a console tool to create ZIM files from a localy stored directory containing a "self-sufficient" HTML content (with pictures, javascript, stylesheets). The result will contain all the files of the local directory compressed and merged in the ZIM file. Nothing more, nothing less. For now, zimwriterfs only works on POSIX compatible systems. You simply need to compile it and run it. The software does not need a lot of resources, but if you create a pretty big ZIM files, then it could take a while to complete.
 +
Instructions on how to prepare and use zimwriterfs are here [[zimwriterfs_instructions]]
 +
[https://github.com/wikimedia/openzim/tree/master/zimwriterfs Go to zimwriterfs source code repository].
 +
 
 +
A virtual machine with zimwriterfs is provided [http://download.kiwix.org/dev/ZIMmaker.ova here].
 +
 
 +
=== Zimbalaka ===
 +
The following descirption is based on the notes published by the original author of Zimbalaka, as they're no longer available on the site they were published on. An archived copy is available on archive.org https://web.archive.org/web/20150531004251/http://www.arunmozhi.in:80/blog/zimbalaka-an-openzim-creator/#content
 +
 
 +
Zimbalaka, is designed as a web hosted tool which enables #Wikipedia ZIM files to be created based on articles selections.
 +
 
 +
It accepts two types of inputs: a list of pages or a Wikipedia category. Then Zimbalaka downloads those pages, removes all the clutter such as: sidebars, toolbox, edit links, etc., and provides a cleaned version as a ZIM file for download. It can be opened in Kiwix, etc.
 +
 
 +
The ZIM is created with a simple welcome page with all the pages as a list of links.  
 +
 
 +
Zimbalaka has multilingual and multi-site support. That is, you can create a ZIM file from pages of any language of the 280+ existing Wikipedias, and also from sites like WikiBooks, Wiktionary, Wikiversity and such. You can even input any custom url like (<nowiki>http://sub.domain.com/</nowiki>), Zimblaka would add (/wiki/Page_title) to it and download the pages.
 +
 
 +
==== Pain points ====
 +
A small pain point is that, Zimbalaka also strips the external references that occur at the end of the Wikipedia articles, as the original author didn’t find these useful content intended to be used in an offline environment.
 +
 
 +
You cannot add a custom Welcome page in the zim file. Not a very big priority. The current file does its work of listing all the pages.
 +
 
 +
You cannot include pages from multiple sites as a single zim file. The workaround is to create multiple files or use a tool called zimwriterfs, which has to be compiled from source (this is used by zimbalaka behind the scenes).
  
==== buildZimFileFromDirectory.pl ====
+
==== Developers ====
This [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/buildZimFileFromDirectory.pl?view=markup script] is part of the [http://www.kiwix.org/index.php/Tools Kiwix tools] and allows to build a ZIM file from a HTML directory containing all necessary ressources.  
+
This tool is written using Flask – A simple Python web framework for the backend, Bootstrap as the frontend and uses the zimwriterfs compiled binary as the workhorse. The zimming tasks are run by Celery, which has been automated by supervisord. All the co-ordination and message passing happens via Redis.
  
You need:
+
[https://github.com/tecoholic/Zimbalaka Here is the source code].
# Checkout the dumping tools : svn co http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/
+
 
# Install all necessary Perl modules
+
=== zimwriterdb ===
# run the script like following: ./buildZimFileFromDirectory.pl --htmlPath=./html [--indexerPath=./zimindexer] [--zimFilePath=articles.zim]
+
[[zimwriterdb]] is part of the openZIM project. This binary uses a pre-filled Postgres Database and create the corresponding ZIM file; the schema for the database is linked on the main zimwriterdb page.
  
 
=== Wiki2html ===
 
=== Wiki2html ===
 
[[Wiki2html]] can be used to prepare static HTML files from a running Mediawiki instance.
 
[[Wiki2html]] can be used to prepare static HTML files from a running Mediawiki instance.
 +
 +
===zimmer===
 +
The [https://github.com/vss-devel/zimmer zimmer] package allows creating a ZIM dump from a Mediawiki-based wiki. This package is relatively easy to install and supports both old and new versions of Mediawiki. It is a kind of an alternative to MWoffliner.
 +
 +
The package consists of two Node.js scripts:
 +
* ''wikizimmer.js'' -- creates static HTML files from the wiki's articles. It requires public access both to the normal web interface and to the wiki's API interface. Unlike mwoffliner, this script does not require Redis
 +
* ''zimmer.js'' -- creates a ZIM file from the static HTML files (without requiring the libzim).
  
 
== See also ==
 
== See also ==
 
* [[ZIM File Archive]]
 
* [[ZIM File Archive]]
 
* [[Bindings]]
 
* [[Bindings]]
* [[Reader]]
+
* [[Readers]]

Latest revision as of 06:33, 30 July 2019

The Wikipedia Book Creator is the easiest way to create custom ZIM files from Wikipedia

A few tools allow people to create ZIM files.

Users

MediaWiki Collection Extension

Unfortunately this feature was removed. Please subscribe and show your interest on this ticket if you want to see it back].

You can create on Wikipedia and all other Wikimedia projects, ZIM files of article selections. For example on http://en.wikipedia.org. This is based on the MediaWiki Collection Extension. The Collection extension for MediaWiki provides the ability to select articles and export them in different formats, such as PDF, ODF and ZIM. The main purpose of Collection extension is to create printed books - instead of export you can also choose to have the selected articles printed on demand as a real book.

The Collection extension can be easily added to any MediaWiki installation:

Ask for a custom ZIM file

If you have a deployment project (so, this file is not only for you own personal purpose), you may ask the Kiwix team to create a ZIM file for you. To do that you need to prepare the following information:

  • Text file with the list of article titles (one title per line, with underscores, UTF8)
  • The URL of the wiki you want to snapshot
  • Prepare a welcome page for the ZIM file (on the wiki, and share with us the title of that page)
  • ZIM file metadata:
    • Title
    • Description (only a few words)
    • 48x48 PNG logo

Create a ZIM file from existing HTML contents

See Zimwriterfs instructions for an overview and read the section below on zimwriterfs for some additional context.

Developers

An example approach to create a ZIM file

MWoffliner

MWoffliner is a tool which allows to "dump" a Wikimedia project (Wikipedia, Wiktionary, ...) to a local storage. It should also work for any Mediawiki instance. It goes through all articles (or a selection if specified) of the project and write HTML/pictures to your local filesystem as plain HTML/JS/CSS/... files or in a ZIM file.

It is distributed via npm and Docker.

If you are a developer, you can download it directly from its git repository.

zimwriterfs

zimwriterfs is a console tool to create ZIM files from a localy stored directory containing a "self-sufficient" HTML content (with pictures, javascript, stylesheets). The result will contain all the files of the local directory compressed and merged in the ZIM file. Nothing more, nothing less. For now, zimwriterfs only works on POSIX compatible systems. You simply need to compile it and run it. The software does not need a lot of resources, but if you create a pretty big ZIM files, then it could take a while to complete. Instructions on how to prepare and use zimwriterfs are here zimwriterfs_instructions Go to zimwriterfs source code repository.

A virtual machine with zimwriterfs is provided here.

Zimbalaka

The following descirption is based on the notes published by the original author of Zimbalaka, as they're no longer available on the site they were published on. An archived copy is available on archive.org https://web.archive.org/web/20150531004251/http://www.arunmozhi.in:80/blog/zimbalaka-an-openzim-creator/#content

Zimbalaka, is designed as a web hosted tool which enables #Wikipedia ZIM files to be created based on articles selections.

It accepts two types of inputs: a list of pages or a Wikipedia category. Then Zimbalaka downloads those pages, removes all the clutter such as: sidebars, toolbox, edit links, etc., and provides a cleaned version as a ZIM file for download. It can be opened in Kiwix, etc.

The ZIM is created with a simple welcome page with all the pages as a list of links.

Zimbalaka has multilingual and multi-site support. That is, you can create a ZIM file from pages of any language of the 280+ existing Wikipedias, and also from sites like WikiBooks, Wiktionary, Wikiversity and such. You can even input any custom url like (http://sub.domain.com/), Zimblaka would add (/wiki/Page_title) to it and download the pages.

Pain points

A small pain point is that, Zimbalaka also strips the external references that occur at the end of the Wikipedia articles, as the original author didn’t find these useful content intended to be used in an offline environment.

You cannot add a custom Welcome page in the zim file. Not a very big priority. The current file does its work of listing all the pages.

You cannot include pages from multiple sites as a single zim file. The workaround is to create multiple files or use a tool called zimwriterfs, which has to be compiled from source (this is used by zimbalaka behind the scenes).

Developers

This tool is written using Flask – A simple Python web framework for the backend, Bootstrap as the frontend and uses the zimwriterfs compiled binary as the workhorse. The zimming tasks are run by Celery, which has been automated by supervisord. All the co-ordination and message passing happens via Redis.

Here is the source code.

zimwriterdb

zimwriterdb is part of the openZIM project. This binary uses a pre-filled Postgres Database and create the corresponding ZIM file; the schema for the database is linked on the main zimwriterdb page.

Wiki2html

Wiki2html can be used to prepare static HTML files from a running Mediawiki instance.

zimmer

The zimmer package allows creating a ZIM dump from a Mediawiki-based wiki. This package is relatively easy to install and supports both old and new versions of Mediawiki. It is a kind of an alternative to MWoffliner.

The package consists of two Node.js scripts:

  • wikizimmer.js -- creates static HTML files from the wiki's articles. It requires public access both to the normal web interface and to the wiki's API interface. Unlike mwoffliner, this script does not require Redis
  • zimmer.js -- creates a ZIM file from the static HTML files (without requiring the libzim).

See also