pollscraper package

Submodules

pollscraper.cli module

Console script for pollscraper.

pollscraper.scraper module

Main module.

class pollscraper.scraper.DataPipeline(http_n_retries=5, http_connection_timeout=5, http_read_timeout=30)[source]

Bases: object

DataPipeline class for processing and transforming data.

This class provides methods to load data from a source, transform it, and save it to a destination.

Attributes:
source (str): The path to the source file. destination (str): The path to the destination file. logger (logger.Logger): Logger instance for logging messages.
clean_data(table_df)[source]

_summary_

Parameters:
table_df (pandas.DataFrame): pandas.DataFrame scraped from
target URL
Returns:
pandas.DataFrame: Cleaned DataFrame
extract_html_table_data(table)[source]

Extract table data from the HTML.

Parameters:
table (BeautifulSoup.Tag): The HTML table element.
Returns:
list: A list of lists containing the table data.
extract_table_data(url)[source]

Extract table data from the given URL.

Parameters:
url (str): The URL to fetch and extract data from.
Returns:
list or pandas.DataFrame: A list of lists if table data is found.
fetch_html_content(url)[source]

Fetch the HTML content from the given URL.

Parameters:
url (str): The URL to fetch the HTML from.
Returns:
requests.Response: The HTTP response object containing the HTML content.
parse_html_bs4(html_content)[source]
parse_html_table(html_content)[source]

Parse the HTML content to extract tables.

Parameters:
html_content (str): The HTML content as a string.
Returns:
list or list of lists: A list of tables as DataFrames if found, otherwise a list of list of lists.
table_data_to_dataframe(table_data)[source]

Convert BeautifulSoup response table data into pandas.DataFrame.

Parameters:
table_data (list or pandas.DataFrame): The table data to process.
Returns:
pandas.DataFrame: The processed DataFrame.
pollscraper.scraper.main()[source]

pollscraper.trends module

class pollscraper.trends.PollTrend[source]

Bases: object

Represents poll trends and provides methods to calculate trends.

This class calculates average poll trends based on poll data.

Attributes:
None

Calculate poll trends based on poll data.

Args:
poll_data (PollData): Poll data containing poll information.
Returns:
pandas.DataFrame:
DataFrame containing daily trends for each candidate.
class pollscraper.trends.Weighting[source]

Bases: object

modality_factor(sample_weights, modality_col)[source]
pollster_factor(sample_weights, pollster_col)[source]
population_factor(sample_weights, population_col)[source]
sample_size_factor(sample_weights, sample_col)[source]
sponsor_factor(sample_weights, sponsor_col)[source]
weighting_scheme_538(samples, sample_col=None, modality_col=None, sponsor_col=None, population_col=None, pollster_col=None)[source]
pollscraper.trends.check_for_outliers_in_individual_polls(poll_data, candidate, avg, sig, n_sigma)[source]
pollscraper.trends.check_for_outliers_in_poll_averages(poll_averages, avg, sig, n_sigma, candidate)[source]
pollscraper.trends.check_offset(offset)[source]
pollscraper.trends.wavg(group)[source]

Module contents

Top-level package for PollScraper.