monitor website changes

Monitoring Website Changes

Overview

Monitoring website changes is a simple way to know when a new post is made. You can use it to track changes to product sites to see if a sale is on, or a support page to see if an app, driver, or firmware has been updated.

Option 1: Online Services (Free / Paid)

There are a few options to choose from one example that has a free account is Visualping.io. The number of pages or times they are monitored are limited on the free account but it has some nice features to control how a page is tracked.

Option 2: Self-Hosted Solutions (Free)

One common solution found online is to use a server or Rasberry Pi to check website differences online using a Python Script and a Crontab (scheduler). The scheduling solution could be applied to a Windows computer as well.

Raspberry PI & Python method

To build your own webpage monitor, there are a few instructions online to do this you can follow. The solutions are quite simple. Upon first run cache the page your are monitoring in a file, then on the next run compare the HTML with the cached version and update the file if a change is found, then send an email alert.

TL;DR Solution

The online solutions might need to be tweaked for your needs. They may create false positives. Here is an updated version of the Python script that may work better. Additional HTML is removed to reduce false positives. The the Python Script below the PATH tags and Whitespace is also removed. This version is based on the work here.

website-monitor.py :

import os
import sys
import requests
from bs4 import BeautifulSoup
import smtplib
import string

SMTP_USER='<YOUR EMAIL>@gmail.com'
SMTP_PASSWORD='<PASSWORD>'
SMTP_HOST='smtp.gmail.com'
SMTP_PORT='465'
SMTP_SSL=True

SMTP_FROM_EMAIL='<FROM EMAIL>@gmail.com'
SMTP_TO_EMAIL='<TO EMAIL>@gmail.com'

def email_notification(subject, message):
    """Send an email notification.

    message - The message to send as the body of the email.
    """
    if (SMTP_SSL):
        smtp_server = smtplib.SMTP_SSL(SMTP_HOST, SMTP_PORT)
    else:
        smtp_server = smtplib.SMTP(SMTP_HOST, SMTP_PORT)

    smtp_server.ehlo()
    smtp_server.login(SMTP_USER, SMTP_PASSWORD)

    email_text = \
"""From: %s
To: %s
Subject: %s

%s
""" % (SMTP_FROM_EMAIL, SMTP_TO_EMAIL, subject, message)

    smtp_server.sendmail(SMTP_FROM_EMAIL, SMTP_TO_EMAIL, email_text)

    smtp_server.close()

def cleanup_html(html):
    """Cleanup the HTML content.

    html - A string containg HTML.
    """
    soup = BeautifulSoup(html, features="lxml")

    for s in soup.select('script'):
        s.extract()

    for s in soup.select('style'):
        s.extract()

    for s in soup.select('meta'):
        s.extract()
    
    for s in soup.select('path'):
        s.extract()
		
    return str(soup).translate({ord(c): None for c in string.whitespace})

def has_website_changed(website_url, website_name):
    """Check if a website has changed since the last request.

    website_url - URL that you want to monitor for changes.
    website_name - Name used for the cache file.
    """
    headers = {
        'User-Agent': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; PIWEBMON)',
        'Cache-Control': 'no-cache'
    }

    response = requests.get(website_url, headers=headers)

    if (response.status_code < 200 or response.status_code > 299):
        return -1

    response_text = cleanup_html(response.text)
    
    cache_filename = website_name + "_cache.txt"

    if not os.path.exists(cache_filename):
        file_handle = open(cache_filename, "w")
        file_handle.write(response_text)
        file_handle.close()
        return 0

    file_handle = open(cache_filename, "r+")
    previous_response_text = file_handle.read()
    file_handle.seek(0)
    print(response_text)
    """print("previou_resp")
    print(previous_response_text)"""
    if response_text == previous_response_text:
        file_handle.close()

        return 0
    else:
        file_handle.truncate()
        file_handle.write(response_text)
        file_handle.close()
        
        return 1

def main():
    """Check if the passed in website has changed."""
    website_status = has_website_changed(sys.argv[1], sys.argv[2])

    if website_status == -1:
        email_notification("An Error has Occurred", "Error While Fetching " + sys.argv[1])
        print("Non 2XX response while fetching")
    elif website_status == 0:
        print("Website is the same")
    elif website_status == 1:
        email_notification("A Change has Occurred", sys.argv[1] + " has changed.")
        print("Website has changed")
        
if __name__ == "__main__":
    main()

Try testing the script from the command line to ensure it works as desired. In this example, we are checking website changes to a custom Android build for a Google Pixel phone.

python3 /home/pi/scripts/website_monitor.py https://pixelbuilds.org/download/crosshatch Pixel3XL-builds

Upon first run, you may receive an error as you might have to install some missing Python libraries.

sudo apt install python3-bs4 python3-lxml python3-requests

Scheduling the script in Crontab

Create a website-monitor.py file in the directory you want to run the script from. Remember that this will create a cache file in the same location as the script. Add the Python Code from above with your Gmail settings or the mail server of your choice. For Gmail you’ll need to create a unique app password to login to Gmail and send an email (instructions).

Edit the Crontab

sudo crontab -e

Add the following based on your timing requirements. In this example, we are monitoring website changes to a custom Android build. Crontab.guru can help create the scheduling text.

# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name command to be executed
 17  *  * * *   root    cd / && run-parts --report /etc/cron.hourly
 25  6  * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
 47  6  * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
 52  6  1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

#run daily at 12 am
 10  0  * * *   root    sudo /usr/bin/python3 /home/pi/scripts/website_monitor.py https://pixelbuilds.org/download/crosshatch Pixel3XL-builds

Summary

In this post, we are able to create our own basic website monitoring solution that will upon running, send an email once a change is detected.