PollScraper

Continuous Integration Pipeline Continuous Deployment Pipeline Documentation Status

A production-ready web scraping utility, built to monitor polling data hosted by the Economist data team.

Artifacts from the latest build can be downloaded in the Actions tab.

Artifacts from the latest daily run can be downloaded in the Actions tab.

The build pipeline is also run as a cron job that executes at 17:30 daily, so these artifacts also reflect the most recent poll results.

Setup

$ python3.8 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements_dev.txt

Run Pipeline

$ # For information on pollscraper argument:
$ pollscraper --help
$ # To scrape polls, and calculate trends:
$ pollscraper --url https://cdn-dev.economistdatateam.com/jobs/pds/code-test/index.html --results_dir data --quiet

Testing

Full testing and linting suite:

$ tox

Building documentation

$ make servedocs

Deployment

$ bumpversion --current-version <current_version> minor # possible: major / minor / patch
$ git push
$ git push --tags

TODO

  • Separation of Concerns - separate CI and CD pipelines
  • Add separate badges for each new pipeline
  • Parameterize the HTTP requests via Click
  • Tidy up documentation, remove stale references such as PyPi

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.