article-parser
To extract main article from given URL
Last updated 5 days ago by dongnd .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install article-parser 
SYNC missed versions from official npm registry.

article-parser

Extract main article, main image and meta data from URL.

NPM Build Status Coverage Status

Demo

Give it a try!

Usage

npm install article-parser

Then:

const {
  extract
} = require('article-parser');

const url = 'https://goo.gl/MV8Tkh';

extract(url).then((article) => {
  console.log(article);
}).catch((err) => {
  console.log(err);
});

APIs

Since v4, article-parser will focus only on its main mission: extract main readable content from given webpages, such as blog posts or news entries. Although it is still able to get other kinds of content like YouTube movies, SoundCloud media, etc, they are just additions.

extract(String url | String html)

Extract data from specified url or full HTML page content. Return: a Promise

Here is how we can use article-parser:

import {
  extract
} from 'article-parser';

const getArticle = async (url) => {
  try {
    const article = await extract(url);
    return article;
  } catch (err) {
    console.trace(err);
  }
};

In comparison to v3, the article object structure has been changed too. Now it looks like below:

{
  "url": URI String,
  "title": String,
  "description": String,
  "image": URI String,
  "author": String,
  "content": HTML String,
  "published": Date String,
  "source": String, // original publisher
  "links": Array, // list of alternative links
  "ttr": Number, // time to read in second, 0 = unknown
}

Configuration methods

In addition, this lib provides some methods to customize default settings. Don't touch them unless you have reason to do that.

  • setParserOptions(Object parserOptions)
  • getParserOptions()
  • setNodeFetchOptions(Object nodeFetchOptions)
  • getNodeFetchOptions()
  • setSanitizeHtmlOptions(Object sanitizeHtmlOptions)
  • getSanitizeHtmlOptions()

Here are default properties/values:

Object parserOptions:

{
  wordsPerMinute: 300,
  urlsCompareAlgorithm: 'levenshtein',
}

Read string-comparison docs for more info about urlsCompareAlgorithm.

Object nodeFetchOptions:

{
  headers: {
    'user-agent': 'article-parser/4.0.0',
  },
  timeout: 30000,
  redirect: 'follow',
  compress: true,
  agent: false,
}

Read node-fetch docs for more info.

Object sanitizeHtmlOptions:

{
  allowedTags: [
    'h1', 'h2', 'h3', 'h4', 'h5',
    'u', 'b', 'i', 'em', 'strong',
    'div', 'span', 'p', 'article', 'blockquote', 'section',
    'pre', 'code',
    'ul', 'ol', 'li', 'dd', 'dl',
    'table', 'th', 'tr', 'td', 'thead', 'tbody', 'tfood',
    'label',
    'fieldset', 'legend',
    'img', 'picture',
    'br', 'p', 'hr',
    'a',
  ],
  allowedAttributes: {
    a: ['href'],
    img: ['src', 'alt'],
  },
}

Read sanitize-html docs for more info.

Test

git clone https://github.com/ndaidong/article-parser.git
cd article-parser
npm install
npm test

License

The MIT License (MIT)

Current Tags

  • 4.0.5                                ...           latest (5 days ago)

86 Versions

  • 4.0.5                                ...           5 days ago
  • 4.0.4                                ...           6 days ago
  • 4.0.2                                ...           8 days ago
  • 4.0.1                                ...           8 days ago
  • 4.0.0                                ...           11 days ago
  • 4.0.0-rc5                                ...           11 days ago
  • 4.0.0-rc4                                ...           11 days ago
  • 4.0.0-rc3                                ...           13 days ago
  • 4.0.0-rc2                                ...           15 days ago
  • 4.0.0-rc1                                ...           15 days ago
  • 3.0.0                                ...           2 months ago
  • 3.0.0-rc1                                ...           2 months ago
  • 2.4.0                                ...           a year ago
  • 2.3.7                                ...           a year ago
  • 2.3.6                                ...           2 years ago
  • 2.3.5                                ...           2 years ago
  • 2.3.4                                ...           2 years ago
  • 2.3.2                                ...           2 years ago
  • 2.3.1                                ...           2 years ago
  • 2.3.0                                ...           2 years ago
  • 2.2.1                                ...           2 years ago
  • 2.2.0                                ...           2 years ago
  • 2.1.1                                ...           2 years ago
  • 2.0.5                                ...           2 years ago
  • 2.0.4                                ...           2 years ago
  • 2.0.3                                ...           2 years ago
  • 2.0.2                                ...           2 years ago
  • 2.0.1                                ...           2 years ago
  • 2.0.0                                ...           2 years ago
  • 2.0.0-rc1                                ...           2 years ago
  • 2.0.0-rc0                                ...           2 years ago
  • 1.6.4                                ...           2 years ago
  • 1.6.21                                ...           2 years ago
  • 1.6.2                                ...           2 years ago
  • 1.6.15                                ...           3 years ago
  • 1.6.14                                ...           3 years ago
  • 1.6.13                                ...           3 years ago
  • 1.6.12                                ...           3 years ago
  • 1.6.11                                ...           3 years ago
  • 1.6.1                                ...           3 years ago
  • 1.6.0                                ...           3 years ago
  • 1.5.3                                ...           3 years ago
  • 1.5.27                                ...           3 years ago
  • 1.5.26                                ...           3 years ago
  • 1.5.25                                ...           3 years ago
  • 1.5.24                                ...           3 years ago
  • 1.5.23                                ...           3 years ago
  • 1.5.22                                ...           3 years ago
  • 1.5.21                                ...           3 years ago
  • 1.5.2                                ...           3 years ago
  • 1.5.1                                ...           3 years ago
  • 1.5.0                                ...           3 years ago
  • 1.1.0                                ...           3 years ago
  • 1.0.0                                ...           3 years ago
  • 0.5.12                                ...           3 years ago
  • 0.5.11                                ...           3 years ago
  • 0.5.10                                ...           3 years ago
  • 0.4.3                                ...           4 years ago
  • 0.4.2                                ...           4 years ago
  • 0.4.1                                ...           4 years ago
  • 0.4.0                                ...           4 years ago
  • 0.3.8                                ...           4 years ago
  • 0.3.7                                ...           4 years ago
  • 0.3.6                                ...           4 years ago
  • 0.3.51                                ...           4 years ago
  • 0.3.5                                ...           4 years ago
  • 0.3.42                                ...           4 years ago
  • 0.3.41                                ...           4 years ago
  • 0.3.4                                ...           4 years ago
  • 0.3.3                                ...           4 years ago
  • 0.3.2                                ...           4 years ago
  • 0.3.12                                ...           4 years ago
  • 0.3.11                                ...           4 years ago
  • 0.2.5                                ...           4 years ago
  • 0.2.42                                ...           4 years ago
  • 0.2.41                                ...           4 years ago
  • 0.2.2                                ...           4 years ago
  • 0.2.1                                ...           4 years ago
  • 0.1.9                                ...           4 years ago
  • 0.1.8                                ...           4 years ago
  • 0.1.6                                ...           4 years ago
  • 0.1.4                                ...           4 years ago
  • 0.1.3                                ...           4 years ago
  • 0.1.2                                ...           4 years ago
  • 0.1.1                                ...           4 years ago
  • 0.1.0                                ...           4 years ago
Maintainers (1)
Downloads
Today 0
This Week 2
This Month 164
Last Day 0
Last Week 129
Last Month 109
Dependencies (11)
Dev Dependencies (4)

Copyright 2014 - 2016 © taobao.org |