sopel.tools.web#

The tools.web package contains utility functions for interaction with web applications, APIs, or websites in your plugins.

New in version 7.0.

sopel.tools.web.r_entity = re.compile('&([^;\\s]+);')#: Regular expression to match HTML entities.

Deprecated since version 8.0: Will be removed in Sopel 9, along with entity().

sopel.tools.web.DEFAULT_HEADERS = {'User-Agent': 'Sopel/8.0.2 (https://sopel.chat)'}#

Default header dict for use with requests methods.

Use it like this:

import requests

from sopel.tools import web

result = requests.get(
    'https://some.site/api/endpoint',
    headers=web.DEFAULT_HEADERS
)

Important

You should never modify this directly in your plugin code. Make a copy and use update() if you need to add or change headers:

from sopel.tools import web

default_headers = web.DEFAULT_HEADERS.copy()
custom_headers = {'Accept': 'text/*'}

default_headers.update(custom_headers)

sopel.tools.web.USER_AGENT = 'Sopel/8.0.2 (https://sopel.chat)'#

User agent string to be sent with HTTP requests.

Meant to be passed like so:

import requests

from sopel.tools import web

result = requests.get(
    'https://some.site/api/endpoint',
    user_agent=web.USER_AGENT
)

sopel.tools.web.decode(text)#

Decode HTML entities into Unicode text.

Parameters:: text (str) – the HTML page or snippet to process
Return str:: text with all entity references replaced

Changed in version 8.0: Renamed html parameter to text. (Python gained a standard library module named html in version 3.4.)

sopel.tools.web.entity(match)#

Convert an entity reference to the appropriate character.

Parameters:: match (str) – the entity name or code, as matched by r_entity
Return str:: the Unicode character corresponding to the given match string, or a fallback representation if the reference cannot be resolved to a character

Deprecated since version 8.0: Will be removed in Sopel 9. Use decode() directly or migrate to Python’s standard-library equivalent, html.unescape().

sopel.tools.web.iri_to_uri(iri)#: Decodes an internationalized domain name (IDN).

sopel.tools.web.quote(string, safe='/')#

Safely encodes a string for use in a URL.

Parameters:

string (str) – the string to encode
safe (str) – a list of characters that should not be quoted; defaults to '/'

Return str:

the string with special characters URL-encoded

Note

This is a shim to make writing cross-compatible plugins for both Python 2 and Python 3 easier.

sopel.tools.web.quote_query(string)#

Safely encodes a URL’s query parameters.

Parameters:: string (str) – a URL containing query parameters
Return str:: the input URL with query parameter values URL-encoded

sopel.tools.web.search_urls(text, exclusion_char=None, clean=False, schemes=None)#

Extracts all URLs in text.

Parameters:

text (str) – the text to search for URLs
exclusion_char (str) – optional character that, if placed before a URL in the text, will exclude it from being extracted
clean (bool) – if True, all found URLs are passed through trim_url() before being returned; default False
schemes (list) – optional list of URL schemes to look for; defaults to ['http', 'https', 'ftp']

Returns:

generator iterator of all URLs found in text

To get the URLs as a plain list, use e.g.:

list(search_urls(text))

sopel.tools.web.trim_url(url)#

Removes extra punctuation from URLs found in text.

Parameters:: url (str) – the raw URL match
Return str:: the cleaned URL

This function removes trailing punctuation that looks like it was not intended to be part of the URL:

trailing sentence- or clause-ending marks like ., ;, etc.
unmatched trailing brackets/braces like }, ), etc.

It is intended for use with the output of search_urls(), which may include trailing punctuation when used on input from chat.

sopel.tools.web.unquote(string)#

Decodes a URL-encoded string.

Parameters:: string (str) – the string to decode
Return str:: the decoded string

Note

This is a convenient shortcut for urllib.parse.unquote.

sopel.tools.web.urlencode( query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus, )#

Encode a dict or sequence of two-element tuples into a URL query string.

If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.

If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.

The components of a query arg may each be either a string or a bytes type.

The safe, encoding, and errors parameters are passed down to the function specified by quote_via (encoding and errors only if a component is a str).

sopel.tools.web.urlencode_non_ascii(b)#: Safely encodes non-ASCII characters in a URL.