pollscraper package¶
Submodules¶
pollscraper.cli module¶
Console script for pollscraper.
pollscraper.scraper module¶
Main module.
-
class
pollscraper.scraper.DataPipeline(http_n_retries=5, http_connection_timeout=5, http_read_timeout=30)[source]¶ Bases:
objectDataPipeline class for processing and transforming data.
This class provides methods to load data from a source, transform it, and save it to a destination.
- Attributes:
- source (str): The path to the source file. destination (str): The path to the destination file. logger (logger.Logger): Logger instance for logging messages.
-
clean_data(table_df)[source]¶ _summary_
- Parameters:
- table_df (pandas.DataFrame): pandas.DataFrame scraped from
- target URL
- Returns:
- pandas.DataFrame: Cleaned DataFrame
-
extract_html_table_data(table)[source]¶ Extract table data from the HTML.
- Parameters:
- table (BeautifulSoup.Tag): The HTML table element.
- Returns:
- list: A list of lists containing the table data.
-
extract_table_data(url)[source]¶ Extract table data from the given URL.
- Parameters:
- url (str): The URL to fetch and extract data from.
- Returns:
- list or pandas.DataFrame: A list of lists if table data is found.
-
fetch_html_content(url)[source]¶ Fetch the HTML content from the given URL.
- Parameters:
- url (str): The URL to fetch the HTML from.
- Returns:
- requests.Response: The HTTP response object containing the HTML content.
pollscraper.trends module¶
-
class
pollscraper.trends.PollTrend[source]¶ Bases:
objectRepresents poll trends and provides methods to calculate trends.
This class calculates average poll trends based on poll data.
- Attributes:
- None
-
classmethod
calculate_trends(poll_data, n_sigma=5, weights_col=None, sample_periodicity='1D', rolling_average_window='7D', start_date=datetime.datetime(2023, 10, 11, 0, 0))[source]¶ Calculate poll trends based on poll data.
- Args:
- poll_data (PollData): Poll data containing poll information.
- Returns:
- pandas.DataFrame:
- DataFrame containing daily trends for each candidate.
-
pollscraper.trends.check_for_outliers_in_individual_polls(poll_data, candidate, avg, sig, n_sigma)[source]¶
Module contents¶
Top-level package for PollScraper.