The tools.web package contains utility functions for interaction with web applications, APIs, or websites in your plugins.

New in version 7.0. = re.compile('&([^;\\s]+);')#

Regular expression to match HTML entities.

Deprecated since version 8.0: Will be removed in Sopel 9, along with entity(). = {'User-Agent': 'Sopel/8.0.0 ('}#

Default header dict for use with requests methods.

Use it like this:

import requests

from import web

result = requests.get(


You should never modify this directly in your plugin code. Make a copy and use update() if you need to add or change headers:

from import web

default_headers = web.DEFAULT_HEADERS.copy()
custom_headers = {'Accept': 'text/*'}

default_headers.update(custom_headers) = 'Sopel/8.0.0 ('#

User agent string to be sent with HTTP requests.

Meant to be passed like so:

import requests

from import web

result = requests.get(

Decode HTML entities into Unicode text.


text (str) – the HTML page or snippet to process

Return str:

text with all entity references replaced

Changed in version 8.0: Renamed html parameter to text. (Python gained a standard library module named html in version 3.4.)

Convert an entity reference to the appropriate character.


match (str) – the entity name or code, as matched by r_entity

Return str:

the Unicode character corresponding to the given match string, or a fallback representation if the reference cannot be resolved to a character

Deprecated since version 8.0: Will be removed in Sopel 9. Use decode() directly or migrate to Python’s standard-library equivalent, html.unescape().

Decodes an internationalized domain name (IDN)., safe='/')#

Safely encodes a string for use in a URL.

  • string (str) – the string to encode

  • safe (str) – a list of characters that should not be quoted; defaults to '/'

Return str:

the string with special characters URL-encoded


This is a shim to make writing cross-compatible plugins for both Python 2 and Python 3 easier.

Safely encodes a URL’s query parameters.


string (str) – a URL containing query parameters

Return str:

the input URL with query parameter values URL-encoded, exclusion_char=None, clean=False, schemes=None)#

Extracts all URLs in text.

  • text (str) – the text to search for URLs

  • exclusion_char (str) – optional character that, if placed before a URL in the text, will exclude it from being extracted

  • clean (bool) – if True, all found URLs are passed through trim_url() before being returned; default False

  • schemes (list) – optional list of URL schemes to look for; defaults to ['http', 'https', 'ftp']


generator iterator of all URLs found in text

To get the URLs as a plain list, use e.g.:


Removes extra punctuation from URLs found in text.


url (str) – the raw URL match

Return str:

the cleaned URL

This function removes trailing punctuation that looks like it was not intended to be part of the URL:

  • trailing sentence- or clause-ending marks like ., ;, etc.

  • unmatched trailing brackets/braces like }, ), etc.

It is intended for use with the output of search_urls(), which may include trailing punctuation when used on input from chat.

Decodes a URL-encoded string.


string (str) – the string to decode

Return str:

the decoded string


This is a convenient shortcut for urllib.parse.unquote.

Encode a dict or sequence of two-element tuples into a URL query string.

If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.

The components of a query arg may each be either a string or a bytes type.

The safe, encoding, and errors parameters are passed down to the function specified by quote_via (encoding and errors only if a component is a str).

Safely encodes non-ASCII characters in a URL.