Zimdiff

zimdiff is a proposed tool in order to facilitate incremental updates for a large ZIM file. It will be written using the zimlib library. The zimdiff is released under the GPLv2 license terms. Note that zimdiff is currently under development. The bugzilla page can be found here

This page discusses the details of the zimdiff tool.

The Zimdiff tool will be used to generate a diff_file between two normal zim files. Lets call them start_file and end_file.

diff_file format
A diff_file will be a normal ZIM file, with some additional data. The ZIM diff file must contain the necessary data to allow to make: start_file + diff_file = end_file

Actions that need to be performed using the diff_file: 1. add 2. remove 3. update

The diff_file will store all articles that have to be added to the start_file. A list of such articles will be maintained in a metadata article. Another article in metadata will contain a list of articles to be removed from the start_file. For updating an article, there will be two options. 1.Store the new article among the list of articles to be added. 2. Store the diff (generated by a diff algorithm )between the old article and the new article in a separate article in the diff_file. A list of such diff articles will be maintained in metadata.

Using the above format, the diff_file can be used to store the difference between the start_file and the end_file, and can be used to update the start_file to obtain the end_file using the zimpatch tool.

file_details
Article Name: 'file_details' This article will contain the UIDs of both the start_file and the end_file, to prevent wrong updates.

add_list
Article Name: 'add_list' This article will contain the list of articles to be added to start_file. The articles themselves can then be obtained from the contents of the diff_file. The Namespace and Title of the articles will be stored.

delete_list
Article Name: 'delete_list' A list of articles to be deleted from the start_file. Namespace and Title provided.

update_list
A List of articles to be updated. The new version of the articles can be found in the contents of the diff_file. Namespace and Title stored.