bedetheque-scraper
NodeJS script to scrap the entire database of dbgest.com / bedetheque.com (approx. 260.000+ albums)
Last updated 4 months ago by givka .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install bedetheque-scraper 
SYNC missed versions from official npm registry.

bedetheque-scraper

NPM Version NPM Downloads Dependency Status devDependency Status

NodeJS script to scrap the entire database of bdgest.com / bedetheque.com. (approx. 40.000+ series, 260.000+ albums)

How it works

It fetches a free proxy list with low timeout, then procede to scrape all comic series and albums letter by letter from bedetheque.com. It will retry 5 times by serie until the serie and its albums are scraped.

Installation

npm install bedetheque-scraper --save

Basic Usage

const { ProxyFetcher, Scraper } = require('bedetheque-scraper')
// or using CommonJS
// import { ProxyFetcher, Scraper } from 'bedetheque-scraper'

async function run() {
  const letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0'.split('');
  for (const letter of letters) {
    const proxyList = await ProxyFetcher.getFreeProxyList();

    const [series, authors] = await Promise.all([
      Scraper.scrapeSeries(proxyList, letter),
      Scraper.scrapeAuthors(proxyList, letter),
    ]);

    console.log(`${letter} done with ${series.length} series and ${authors.length} authors`);
  }
}

Structure

Serie

// 'https://www.bedetheque.com/serie-10739-BD-Roi-des-mouches.html'

{
  "serie": {
    "serieId": 10739,
    "serieTitle": "Le roi des mouches",
    "numberOfAlbums": 3,
    "albumsId": [ 42297, 77882, 178960 ],
    "recommendationsId": [ 3633, 51397, 326, 13687, 14319, 31517, 24640 ],
    "voteAverage": 87,
    "voteCount": 202,
    "serieCover": "Couv_42297.jpg"
  },
  "albums": [
    {
      "serieId": 10739,
      "serieTitle": "Le roi des mouches",
      "albumNumber": 1,
      "albumId": 42297,
      "albumTitle": "Hallorave",
      "imageCover": "Couv_42297.jpg",
      "imageExtract": "roidesmouches01p.jpg",
      "imageReverse": "Verso_42297.jpg",
      "voteAverage": 88,
      "voteCount": 65,
      "scenario": "Pirus, Michel",
      "drawing": "Mezzo",
      "colors": "Ruby",
      "date": "01/2005",
      "editor": "Albin Michel",
      "nbrOfPages": 62
    },
    {
      "serieId": 10739,
      "serieTitle": "Le roi des mouches",
      "albumNumber": 2,
      "albumId": 77882,
      "albumTitle": "L'origine du monde",
      "imageCover": "RoiDesMouchesLe2_18092008_213101.jpg",
      "imageExtract": "AlbroiDesMouchesLe2_18092008_213101.jpg",
      "imageReverse": "roidesmouches02v_77882.jpg",
      "voteAverage": 86,
      "voteCount": 100,
      "scenario": "Pirus, Michel",
      "drawing": "Mezzo",
      "colors": "Ruby",
      "date": "09/2008",
      "editor": "Glénat",
      "nbrOfPages": 62
    },
    {
      "serieId": 10739,
      "serieTitle": "Le roi des mouches",
      "albumNumber": 3,
      "albumId": 178960,
      "albumTitle": "Sourire suivant",
      "imageCover": "178960_c.jpg",
      "imageExtract": "178960_pla.jpg",
      "imageReverse": "Verso_178960.jpg",
      "voteAverage": 88,
      "voteCount": 37,
      "scenario": "Pirus, Michel",
      "drawing": "Mezzo",
      "colors": "Ruby",
      "date": "01/2013",
      "editor": "Glénat",
      "nbrOfPages": 62
    },
  ]
}

Author

// 'https://www.bedetheque.com/auteur-232-BD-Blain-Christophe.html'
{
  "authorId": 232,
  "name": "Blain, Christophe",
  "image": "Photo_232.jpg",
  "birthDate": "10/08/1970",
  "deathDate": null,
  "seriesIdScenario": [],
  "seriesIdDrawing": [ 55755, 3168, 2325, 1358, 10330, 1994 ],
  "seriesIdBoth": [ 27589, 38023, 14662, 517, 24260, 3898 ]
}

Image Sizes

Serie

// serieCoverLarge: https://www.bedetheque.com/media/Couvertures/${serieCover}
// serieCoverSmall: https://www.bedetheque.com/cache/thb_couv/${serieCover}
public serieCover: string | null;

Album

// imageCoverLarge: https://www.bedetheque.com/media/Couvertures/${imageCover}
// imageCoverSmall: https://www.bedetheque.com/cache/thb_couv/${imageCover}
public imageCover: string | null;

// imageExtractLarge: https://www.bedetheque.com/media/Planches/${imageExtract}
// imageExtractSmall: https://www.bedetheque.com/cache/thb_planches/${imageExtract}
public imageExtract: string | null;

// imageReverseLarge: https://www.bedetheque.com/media/Versos/${imageReverse}
// imageReverseSmall: https://www.bedetheque.com/cache/thb_versos/${imageReverse}
public imageReverse: string | null;

Author

// imageLarge: https://www.bedetheque.com/media/Photos/${image}
public image: string | null;

TODO

  • [ ] scrap serie description
  • [ ] scrap serie popularity

License

MIT

Current Tags

  • 2.4.0                                ...           latest (4 months ago)

23 Versions

  • 2.4.0                                ...           4 months ago
  • 2.3.5                                ...           4 months ago
  • 2.3.4                                ...           4 months ago
  • 2.3.3                                ...           4 months ago
  • 2.3.2                                ...           4 months ago
  • 2.3.1                                ...           4 months ago
  • 2.3.0                                ...           4 months ago
  • 2.2.0                                ...           4 months ago
  • 2.1.2                                ...           4 months ago
  • 2.1.1                                ...           4 months ago
  • 2.1.0                                ...           4 months ago
  • 2.0.6                                ...           5 months ago
  • 2.0.5                                ...           5 months ago
  • 2.0.4                                ...           6 months ago
  • 2.0.3                                ...           6 months ago
  • 2.0.2                                ...           6 months ago
  • 2.0.1                                ...           6 months ago
  • 2.0.0                                ...           6 months ago
  • 1.0.5                                ...           7 months ago
  • 1.0.3                                ...           7 months ago
  • 1.0.2                                ...           7 months ago
  • 1.0.1                                ...           7 months ago
  • 1.0.0                                ...           7 months ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 1
Last Day 0
Last Week 1
Last Month 23
Dependencies (5)
Dev Dependencies (8)
Dependents (0)
None

Copyright 2014 - 2016 © taobao.org |