sopel.tools.web#
The tools.web
package contains utility functions for interaction with web
applications, APIs, or websites in your plugins.
New in version 7.0.
- sopel.tools.web.r_entity = re.compile('&([^;\\s]+);')#
Regular expression to match HTML entities.
Deprecated since version 8.0: Will be removed in Sopel 9, along with
entity()
.
- sopel.tools.web.DEFAULT_HEADERS = {'User-Agent': 'Sopel/8.0.1 (https://sopel.chat)'}#
Default header dict for use with
requests
methods.Use it like this:
import requests from sopel.tools import web result = requests.get( 'https://some.site/api/endpoint', headers=web.DEFAULT_HEADERS )
Important
You should never modify this directly in your plugin code. Make a copy and use
update()
if you need to add or change headers:from sopel.tools import web default_headers = web.DEFAULT_HEADERS.copy() custom_headers = {'Accept': 'text/*'} default_headers.update(custom_headers)
- sopel.tools.web.USER_AGENT = 'Sopel/8.0.1 (https://sopel.chat)'#
User agent string to be sent with HTTP requests.
Meant to be passed like so:
import requests from sopel.tools import web result = requests.get( 'https://some.site/api/endpoint', user_agent=web.USER_AGENT )
- sopel.tools.web.decode(text)#
Decode HTML entities into Unicode text.
- Parameters:
text (str) – the HTML page or snippet to process
- Return str:
text
with all entity references replaced
Changed in version 8.0: Renamed
html
parameter totext
. (Python gained a standard library module namedhtml
in version 3.4.)
- sopel.tools.web.entity(match)#
Convert an entity reference to the appropriate character.
- Parameters:
match (str) – the entity name or code, as matched by
r_entity
- Return str:
the Unicode character corresponding to the given
match
string, or a fallback representation if the reference cannot be resolved to a character
Deprecated since version 8.0: Will be removed in Sopel 9. Use
decode()
directly or migrate to Python’s standard-library equivalent,html.unescape()
.
- sopel.tools.web.iri_to_uri(iri)#
Decodes an internationalized domain name (IDN).
- sopel.tools.web.quote(string, safe='/')#
Safely encodes a string for use in a URL.
- Parameters:
- Return str:
the
string
with special characters URL-encoded
Note
This is a shim to make writing cross-compatible plugins for both Python 2 and Python 3 easier.
- sopel.tools.web.quote_query(string)#
Safely encodes a URL’s query parameters.
- Parameters:
string (str) – a URL containing query parameters
- Return str:
the input URL with query parameter values URL-encoded
- sopel.tools.web.search_urls(text, exclusion_char=None, clean=False, schemes=None)#
Extracts all URLs in
text
.- Parameters:
text (str) – the text to search for URLs
exclusion_char (str) – optional character that, if placed before a URL in the
text
, will exclude it from being extractedclean (bool) – if
True
, all found URLs are passed throughtrim_url()
before being returned; defaultFalse
schemes (list) – optional list of URL schemes to look for; defaults to
['http', 'https', 'ftp']
- Returns:
generator iterator of all URLs found in
text
To get the URLs as a plain list, use e.g.:
list(search_urls(text))
- sopel.tools.web.trim_url(url)#
Removes extra punctuation from URLs found in text.
- Parameters:
url (str) – the raw URL match
- Return str:
the cleaned URL
This function removes trailing punctuation that looks like it was not intended to be part of the URL:
trailing sentence- or clause-ending marks like
.
,;
, etc.unmatched trailing brackets/braces like
}
,)
, etc.
It is intended for use with the output of
search_urls()
, which may include trailing punctuation when used on input from chat.
- sopel.tools.web.unquote(string)#
Decodes a URL-encoded string.
- Parameters:
string (str) – the string to decode
- Return str:
the decoded
string
Note
This is a convenient shortcut for
urllib.parse.unquote
.
- sopel.tools.web.urlencode(
- query,
- doseq=False,
- safe='',
- encoding=None,
- errors=None,
- quote_via=quote_plus,
Encode a dict or sequence of two-element tuples into a URL query string.
If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.
The components of a query arg may each be either a string or a bytes type.
The safe, encoding, and errors parameters are passed down to the function specified by quote_via (encoding and errors only if a component is a str).
- sopel.tools.web.urlencode_non_ascii(b)#
Safely encodes non-ASCII characters in a URL.