mwoffliner
Mediawiki HTML/ZIM scraper
Last updated 12 days ago by isnit0 .
GPL-3.0 · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install mwoffliner 
SYNC missed versions from official npm registry.

MWoffliner

MWoffliner is a tool for making a local offline HTML snapshot of any online Mediawiki instance. It goes through all articles (or a selection if specified) and writes the HTML/images to a local directory. It has mainly been tested against Wikimedia projects like Wikipedia, Wiktionary, ... But it should also work for any recent Mediawiki.

It can write the raw HTML/JS/CSS/PNG... files to the filesystem or pack them all in a highly compressed ZIM file.

Read CONTRIBUTING.md to know more about MWoffliner development.

NPM

npm Docker Build Status Build Status codecov CodeFactor NPM

Prerequisites

  • *NIX Operating System (Linux/macOS)
  • NodeJS
  • Redis
  • Libzim (On linux we automatically download binaries)
  • Various build tools that are probably already installed on your machine (libjpeg, gcc)

See Environment setup hints to know more about how to install them.

Usage

To install MWoffliner globally:

npm i -g mwoffliner

You might need to run this command with the sudo command, depending how your npm is configured.

Then to run it:

mwoffliner --help

API

MWoffliner provides also an API and therefore can be used as a NodeJS library. Here a stub example:

const mwoffliner = require('mwoffliner');
const parameters = {
    mwUrl: "https://es.wikipedia.org",
    adminEmail: "foo@bar.net",
    verbose: true,
    format: "nozim",
    articleList: "./articleList"
};
mwoffliner.execute(parameters); // returns a Promise

Background

Complementary information about MWoffliner:

  • MediaWiki software is used by dozen of thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia.
  • MediaWiki is a PHP wiki runtime engine.
  • Wikitext is the name of the markup language that MediaWiki uses.
  • MediaWiki includes a parser for WikiText into HTML, and this parser creates the HTML pages displayed in your browser.
  • There is another WikiText parser, called Parsoid, implemented in Javascript/NodeJS. MWoffliner uses Parsoid.
  • Parsoid is planned to eventually become the main parser for MediaWiki.
  • MWoffliner calls Parsoid and then post-processes the results for offline format.

Environment setup hints

MacOS

Install NodeJS:

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && \
source ~/.bashrc && \
nvm install stable && \
node --version

Install Redis:

brew install redis

Install libzim: Read these instructions

GNU/Linux - Debian based distributions

Install NodeJS:

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.11/install.sh | bash && \
source ~/.bashrc && \
nvm install stable && \
node --version

Install Redis:

sudo apt-get install redis-server

Current Tags

  • 1.9.6-1565968461694                                ...           dev (7 days ago)
  • 1.9.6                                ...           latest (12 days ago)
  • 1.9.5-rc1                                ...           rc (22 days ago)

79 Versions

  • 1.9.6-1565968461694                                ...           7 days ago
  • 1.9.6-1565534303357                                ...           12 days ago
  • 1.9.6                                ...           12 days ago
  • 1.9.5-1565504480621                                ...           12 days ago
  • 1.9.5-1565257579465                                ...           15 days ago
  • 1.9.5-1565256582919                                ...           15 days ago
  • 1.9.5-1565255129327                                ...           15 days ago
  • 1.9.5-1564802207572                                ...           20 days ago
  • 1.9.5-1564676702010                                ...           22 days ago
  • 1.9.5                                ...           22 days ago
  • 1.9.5-rc1                                ...           22 days ago
  • 1.9.4-1564673279716                                ...           22 days ago
  • 1.9.5-rc1-1564672653355                                ...           22 days ago
  • 1.9.4-1564671403069                                ...           22 days ago
  • 1.9.4-1564653534747                                ...           22 days ago
  • 1.9.4-1564598710475                                ...           22 days ago
  • 1.9.4-1564591940813                                ...           22 days ago
  • 1.9.4-1564154025042                                ...           a month ago
  • 1.9.4-1564123991543                                ...           a month ago
  • 1.9.4-1564065577735                                ...           a month ago
  • 1.9.4-1563952613266                                ...           a month ago
  • 1.9.4-1563806962808                                ...           a month ago
  • 1.9.4-1563806395093                                ...           a month ago
  • 1.9.4-1563532785491                                ...           a month ago
  • 1.9.4-1563180432331                                ...           a month ago
  • 1.9.4-1563180279660                                ...           a month ago
  • 1.9.4-1563180091203                                ...           a month ago
  • 1.9.4                                ...           a month ago
  • 1.9.4-rc1                                ...           a month ago
  • 1.9.3                                ...           2 months ago
  • 1.9.3-rc4                                ...           2 months ago
  • 1.9.3-rc3                                ...           2 months ago
  • 1.9.3-rc2                                ...           2 months ago
  • 1.9.3-rc1                                ...           2 months ago
  • 1.9.2                                ...           3 months ago
  • 1.9.2-rc3                                ...           3 months ago
  • 1.9.2-rc2                                ...           3 months ago
  • 1.9.2-rc1                                ...           3 months ago
  • 1.9.1                                ...           3 months ago
  • 1.9.0-rc1                                ...           3 months ago
  • 1.8.6                                ...           4 months ago
  • 1.8.5                                ...           4 months ago
  • 1.8.4                                ...           4 months ago
  • 1.8.3                                ...           4 months ago
  • 1.8.2                                ...           5 months ago
  • 1.8.1                                ...           5 months ago
  • 1.8.0                                ...           5 months ago
  • 1.7.1                                ...           7 months ago
  • 1.7.0                                ...           7 months ago
  • 1.6.0                                ...           9 months ago
  • 1.5.0                                ...           a year ago
  • 1.4.1                                ...           a year ago
  • 1.4.0                                ...           a year ago
  • 1.3.7                                ...           2 years ago
  • 1.3.6                                ...           2 years ago
  • 1.3.5                                ...           2 years ago
  • 1.3.4                                ...           2 years ago
  • 1.3.3                                ...           2 years ago
  • 1.3.2                                ...           2 years ago
  • 1.3.1                                ...           2 years ago
  • 1.3.0                                ...           2 years ago
  • 1.2.7                                ...           2 years ago
  • 1.2.6                                ...           2 years ago
  • 1.2.5                                ...           2 years ago
  • 1.2.4                                ...           2 years ago
  • 1.2.3                                ...           2 years ago
  • 1.2.2                                ...           2 years ago
  • 1.2.1                                ...           2 years ago
  • 1.2.0                                ...           2 years ago
  • 1.1.6                                ...           2 years ago
  • 1.1.5                                ...           2 years ago
  • 1.1.4                                ...           2 years ago
  • 1.1.3                                ...           2 years ago
  • 1.1.2                                ...           2 years ago
  • 1.1.1                                ...           2 years ago
  • 1.0.5                                ...           2 years ago
  • 1.0.4                                ...           2 years ago
  • 1.0.3                                ...           2 years ago
  • 1.0.2                                ...           2 years ago
Downloads
Today 0
This Week 0
This Month 499
Last Day 0
Last Week 124
Last Month 213
Dependencies (47)
Dev Dependencies (11)
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |