cpython/Doc/library/base64.rst

309 lines
11 KiB
ReStructuredText
Raw Normal View History

:mod:`base64` --- Base16, Base32, Base64, Base85 Data Encodings
===============================================================
2007-08-15 11:28:22 -03:00
.. module:: base64
:synopsis: RFC 4648: Base16, Base32, Base64 Data Encodings;
Base85 and Ascii85
2007-08-15 11:28:22 -03:00
**Source code:** :source:`Lib/base64.py`
2007-08-15 11:28:22 -03:00
.. index::
pair: base64; encoding
single: MIME; base64 encoding
--------------
This module provides functions for encoding binary data to printable
ASCII characters and decoding such encodings back to binary data.
It provides encoding and decoding functions for the encodings specified in
:rfc:`4648`, which defines the Base16, Base32, and Base64 algorithms,
and for the de-facto standard Ascii85 and Base85 encodings.
The :rfc:`4648` encodings are suitable for encoding binary data so that it can be
safely sent by email, used as parts of URLs, or included as part of an HTTP
POST request. The encoding algorithm is not the same as the
:program:`uuencode` program.
There are two interfaces provided by this module. The modern interface
supports encoding :term:`bytes-like objects <bytes-like object>` to ASCII
:class:`bytes`, and decoding :term:`bytes-like objects <bytes-like object>` or
strings containing ASCII to :class:`bytes`. Both base-64 alphabets
defined in :rfc:`4648` (normal, and URL- and filesystem-safe) are supported.
The legacy interface does not support decoding from strings, but it does
provide functions for encoding and decoding to and from :term:`file objects
<file object>`. It only supports the Base64 standard alphabet, and it adds
newlines every 76 characters as per :rfc:`2045`. Note that if you are looking
for :rfc:`2045` support you probably want to be looking at the :mod:`email`
package instead.
.. versionchanged:: 3.3
ASCII-only Unicode strings are now accepted by the decoding functions of
the modern interface.
2007-08-15 11:28:22 -03:00
.. versionchanged:: 3.4
Any :term:`bytes-like objects <bytes-like object>` are now accepted by all
encoding and decoding functions in this module. Ascii85/Base85 support added.
The modern interface provides:
2007-08-15 11:28:22 -03:00
.. function:: b64encode(s, altchars=None)
2007-08-15 11:28:22 -03:00
Encode the :term:`bytes-like object` *s* using Base64 and return the encoded
:class:`bytes`.
2007-08-15 11:28:22 -03:00
Optional *altchars* must be a :term:`bytes-like object` of at least
2007-08-15 11:28:22 -03:00
length 2 (additional characters are ignored) which specifies an alternative
alphabet for the ``+`` and ``/`` characters. This allows an application to e.g.
generate URL or filesystem safe Base64 strings. The default is ``None``, for
which the standard Base64 alphabet is used.
.. function:: b64decode(s, altchars=None, validate=False)
2007-08-15 11:28:22 -03:00
Decode the Base64 encoded :term:`bytes-like object` or ASCII string
*s* and return the decoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
Optional *altchars* must be a :term:`bytes-like object` or ASCII string of
at least length 2 (additional characters are ignored) which specifies the
alternative alphabet used instead of the ``+`` and ``/`` characters.
2007-08-15 11:28:22 -03:00
A :exc:`binascii.Error` exception is raised
if *s* is incorrectly padded.
If *validate* is ``False`` (the default), characters that are neither
in the normal base-64 alphabet nor the alternative alphabet are
discarded prior to the padding check. If *validate* is ``True``,
these non-alphabet characters in the input result in a
:exc:`binascii.Error`.
2007-08-15 11:28:22 -03:00
For more information about the strict base64 check, see :func:`binascii.a2b_base64`
2007-08-15 11:28:22 -03:00
.. function:: standard_b64encode(s)
Encode :term:`bytes-like object` *s* using the standard Base64 alphabet
and return the encoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
.. function:: standard_b64decode(s)
Decode :term:`bytes-like object` or ASCII string *s* using the standard
Base64 alphabet and return the decoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
.. function:: urlsafe_b64encode(s)
Encode :term:`bytes-like object` *s* using the
URL- and filesystem-safe alphabet, which
substitutes ``-`` instead of ``+`` and ``_`` instead of ``/`` in the
standard Base64 alphabet, and return the encoded :class:`bytes`. The result
Merged revisions 69576,69579-69580,69589,69619-69620,69633,69703-69704,69728-69730 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r69576 | georg.brandl | 2009-02-13 04:56:50 -0600 (Fri, 13 Feb 2009) | 1 line #1661108: note that urlsafe encoded string can contain "=". ........ r69579 | georg.brandl | 2009-02-13 05:06:59 -0600 (Fri, 13 Feb 2009) | 2 lines Fix warnings GCC emits where the argument of PyErr_Format is a single variable. ........ r69580 | georg.brandl | 2009-02-13 05:10:04 -0600 (Fri, 13 Feb 2009) | 2 lines Fix warnings GCC emits where the argument of PyErr_Format is a single variable. ........ r69589 | martin.v.loewis | 2009-02-13 14:11:34 -0600 (Fri, 13 Feb 2009) | 2 lines Move amd64 properties further to the top, so that they override the linker options correctly. ........ r69619 | benjamin.peterson | 2009-02-14 11:00:51 -0600 (Sat, 14 Feb 2009) | 1 line this needn't be a shebang line ........ r69620 | georg.brandl | 2009-02-14 11:01:36 -0600 (Sat, 14 Feb 2009) | 1 line #5179: don't leak PIPE fds when child execution fails. ........ r69633 | hirokazu.yamamoto | 2009-02-15 03:19:48 -0600 (Sun, 15 Feb 2009) | 1 line Fixed typo. ........ r69703 | raymond.hettinger | 2009-02-16 16:42:54 -0600 (Mon, 16 Feb 2009) | 3 lines Issue 5229: Documentation for super() neglects to say what super() actually does ........ r69704 | raymond.hettinger | 2009-02-16 17:00:25 -0600 (Mon, 16 Feb 2009) | 1 line Add explanation for super(type1, type2). ........ r69728 | georg.brandl | 2009-02-17 18:22:55 -0600 (Tue, 17 Feb 2009) | 2 lines #5297: fix example. ........ r69729 | georg.brandl | 2009-02-17 18:25:13 -0600 (Tue, 17 Feb 2009) | 2 lines #5296: sequence -> iterable. ........ r69730 | georg.brandl | 2009-02-17 18:31:36 -0600 (Tue, 17 Feb 2009) | 2 lines #5268: mention VMSError. ........
2009-02-19 00:22:03 -04:00
can still contain ``=``.
2007-08-15 11:28:22 -03:00
.. function:: urlsafe_b64decode(s)
Decode :term:`bytes-like object` or ASCII string *s*
using the URL- and filesystem-safe
alphabet, which substitutes ``-`` instead of ``+`` and ``_`` instead of
``/`` in the standard Base64 alphabet, and return the decoded
:class:`bytes`.
2007-08-15 11:28:22 -03:00
.. function:: b32encode(s)
Encode the :term:`bytes-like object` *s* using Base32 and return the
encoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
.. function:: b32decode(s, casefold=False, map01=None)
2007-08-15 11:28:22 -03:00
Decode the Base32 encoded :term:`bytes-like object` or ASCII string *s* and
return the decoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
Optional *casefold* is a flag specifying
whether a lowercase alphabet is acceptable as input. For security purposes,
the default is ``False``.
2007-08-15 11:28:22 -03:00
:rfc:`4648` allows for optional mapping of the digit 0 (zero) to the letter O
2007-08-15 11:28:22 -03:00
(oh), and for optional mapping of the digit 1 (one) to either the letter I (eye)
or letter L (el). The optional argument *map01* when not ``None``, specifies
which letter the digit 1 should be mapped to (when *map01* is not ``None``, the
digit 0 is always mapped to the letter O). For security purposes the default is
``None``, so that 0 and 1 are not allowed in the input.
A :exc:`binascii.Error` is raised if *s* is
2007-08-15 11:28:22 -03:00
incorrectly padded or if there are non-alphabet characters present in the
input.
2007-08-15 11:28:22 -03:00
.. function:: b32hexencode(s)
Similar to :func:`b32encode` but uses the Extended Hex Alphabet, as defined in
:rfc:`4648`.
.. versionadded:: 3.10
.. function:: b32hexdecode(s, casefold=False)
Similar to :func:`b32decode` but uses the Extended Hex Alphabet, as defined in
:rfc:`4648`.
This version does not allow the digit 0 (zero) to the letter O (oh) and digit
1 (one) to either the letter I (eye) or letter L (el) mappings, all these
characters are included in the Extended Hex Alphabet and are not
interchangeable.
.. versionadded:: 3.10
2007-08-15 11:28:22 -03:00
.. function:: b16encode(s)
Encode the :term:`bytes-like object` *s* using Base16 and return the
encoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
.. function:: b16decode(s, casefold=False)
2007-08-15 11:28:22 -03:00
Decode the Base16 encoded :term:`bytes-like object` or ASCII string *s* and
return the decoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
Optional *casefold* is a flag specifying whether a
2007-08-15 11:28:22 -03:00
lowercase alphabet is acceptable as input. For security purposes, the default
is ``False``.
A :exc:`binascii.Error` is raised if *s* is
2007-08-15 11:28:22 -03:00
incorrectly padded or if there are non-alphabet characters present in the
input.
2007-08-15 11:28:22 -03:00
.. function:: a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False)
Encode the :term:`bytes-like object` *b* using Ascii85 and return the
encoded :class:`bytes`.
*foldspaces* is an optional flag that uses the special short sequence 'y'
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
feature is not supported by the "standard" Ascii85 encoding.
*wrapcol* controls whether the output should have newline (``b'\n'``)
characters added to it. If this is non-zero, each output line will be
at most this many characters long.
*pad* controls whether the input is padded to a multiple of 4
before encoding. Note that the ``btoa`` implementation always pads.
*adobe* controls whether the encoded byte sequence is framed with ``<~``
and ``~>``, which is used by the Adobe implementation.
.. versionadded:: 3.4
.. function:: a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \\t\\n\\r\\v')
Decode the Ascii85 encoded :term:`bytes-like object` or ASCII string *b* and
return the decoded :class:`bytes`.
*foldspaces* is a flag that specifies whether the 'y' short sequence
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
This feature is not supported by the "standard" Ascii85 encoding.
*adobe* controls whether the input sequence is in Adobe Ascii85 format
(i.e. is framed with <~ and ~>).
*ignorechars* should be a :term:`bytes-like object` or ASCII string
containing characters to ignore
from the input. This should only contain whitespace characters, and by
default contains all whitespace characters in ASCII.
.. versionadded:: 3.4
.. function:: b85encode(b, pad=False)
Encode the :term:`bytes-like object` *b* using base85 (as used in e.g.
git-style binary diffs) and return the encoded :class:`bytes`.
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
multiple of 4 bytes before encoding.
.. versionadded:: 3.4
.. function:: b85decode(b)
Decode the base85-encoded :term:`bytes-like object` or ASCII string *b* and
return the decoded :class:`bytes`. Padding is implicitly removed, if
necessary.
.. versionadded:: 3.4
The legacy interface:
2007-08-15 11:28:22 -03:00
.. function:: decode(input, output)
Decode the contents of the binary *input* file and write the resulting binary
data to the *output* file. *input* and *output* must be :term:`file objects
<file object>`. *input* will be read until ``input.readline()`` returns an
empty bytes object.
2007-08-15 11:28:22 -03:00
.. function:: decodebytes(s)
2007-08-15 11:28:22 -03:00
Decode the :term:`bytes-like object` *s*, which must contain one or more
lines of base64 encoded data, and return the decoded :class:`bytes`.
2007-08-15 11:28:22 -03:00
.. versionadded:: 3.1
2007-08-15 11:28:22 -03:00
.. function:: encode(input, output)
Encode the contents of the binary *input* file and write the resulting base64
encoded data to the *output* file. *input* and *output* must be :term:`file
objects <file object>`. *input* will be read until ``input.read()`` returns
an empty bytes object. :func:`encode` inserts a newline character (``b'\n'``)
after every 76 bytes of the output, as well as ensuring that the output
always ends with a newline, as per :rfc:`2045` (MIME).
2007-08-15 11:28:22 -03:00
.. function:: encodebytes(s)
2007-08-15 11:28:22 -03:00
Encode the :term:`bytes-like object` *s*, which can contain arbitrary binary
data, and return :class:`bytes` containing the base64-encoded data, with newlines
(``b'\n'``) inserted after every 76 bytes of output, and ensuring that
there is a trailing newline, as per :rfc:`2045` (MIME).
.. versionadded:: 3.1
2007-08-15 11:28:22 -03:00
Merged revisions 61724-61725,61731-61735,61737,61739,61741,61743-61744,61753,61761,61765-61767,61769,61773,61776-61778,61780-61783,61788,61793,61796,61807,61813 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ................ r61724 | martin.v.loewis | 2008-03-22 01:01:12 +0100 (Sat, 22 Mar 2008) | 49 lines Merged revisions 61602-61723 via svnmerge from svn+ssh://pythondev@svn.python.org/sandbox/trunk/2to3/lib2to3 ........ r61626 | david.wolever | 2008-03-19 17:19:16 +0100 (Mi, 19 M?\195?\164r 2008) | 1 line Added fixer for implicit local imports. See #2414. ........ r61628 | david.wolever | 2008-03-19 17:57:43 +0100 (Mi, 19 M?\195?\164r 2008) | 1 line Added a class for tests which should not run if a particular import is found. ........ r61629 | collin.winter | 2008-03-19 17:58:19 +0100 (Mi, 19 M?\195?\164r 2008) | 1 line Two more relative import fixes in pgen2. ........ r61635 | david.wolever | 2008-03-19 20:16:03 +0100 (Mi, 19 M?\195?\164r 2008) | 1 line Fixed print fixer so it will do the Right Thing when it encounters __future__.print_function. 2to3 gets upset, though, so the tests have been commented out. ........ r61637 | david.wolever | 2008-03-19 21:37:17 +0100 (Mi, 19 M?\195?\164r 2008) | 3 lines Added a fixer for itertools imports (from itertools import imap, ifilterfalse --> from itertools import filterfalse) ........ r61645 | david.wolever | 2008-03-19 23:22:35 +0100 (Mi, 19 M?\195?\164r 2008) | 1 line SVN is happier when you add the files you create... -_-' ........ r61654 | david.wolever | 2008-03-20 01:09:56 +0100 (Do, 20 M?\195?\164r 2008) | 1 line Added an explicit sort order to fixers -- fixes problems like #2427 ........ r61664 | david.wolever | 2008-03-20 04:32:40 +0100 (Do, 20 M?\195?\164r 2008) | 3 lines Fixes #2428 -- comments are no longer eatten by __future__ fixer. ........ r61673 | david.wolever | 2008-03-20 17:22:40 +0100 (Do, 20 M?\195?\164r 2008) | 1 line Added 2to3 node pretty-printer ........ r61679 | david.wolever | 2008-03-20 20:50:42 +0100 (Do, 20 M?\195?\164r 2008) | 1 line Made node printing a little bit prettier ........ r61723 | martin.v.loewis | 2008-03-22 00:59:27 +0100 (Sa, 22 M?\195?\164r 2008) | 2 lines Fix whitespace. ........ ................ r61725 | martin.v.loewis | 2008-03-22 01:02:41 +0100 (Sat, 22 Mar 2008) | 2 lines Install lib2to3. ................ r61731 | facundo.batista | 2008-03-22 03:45:37 +0100 (Sat, 22 Mar 2008) | 4 lines Small fix that complicated the test actually when that test failed. ................ r61732 | alexandre.vassalotti | 2008-03-22 05:08:44 +0100 (Sat, 22 Mar 2008) | 2 lines Added warning for the removal of 'hotshot' in Py3k. ................ r61733 | georg.brandl | 2008-03-22 11:07:29 +0100 (Sat, 22 Mar 2008) | 4 lines #1918: document that weak references *to* an object are cleared before the object's __del__ is called, to ensure that the weak reference callback (if any) finds the object healthy. ................ r61734 | georg.brandl | 2008-03-22 11:56:23 +0100 (Sat, 22 Mar 2008) | 2 lines Activate the Sphinx doctest extension and convert howto/functional to use it. ................ r61735 | georg.brandl | 2008-03-22 11:58:38 +0100 (Sat, 22 Mar 2008) | 2 lines Allow giving source names on the cmdline. ................ r61737 | georg.brandl | 2008-03-22 12:00:48 +0100 (Sat, 22 Mar 2008) | 2 lines Fixup this HOWTO's doctest blocks so that they can be run with sphinx' doctest builder. ................ r61739 | georg.brandl | 2008-03-22 12:47:10 +0100 (Sat, 22 Mar 2008) | 2 lines Test decimal.rst doctests as far as possible with sphinx doctest. ................ r61741 | georg.brandl | 2008-03-22 13:04:26 +0100 (Sat, 22 Mar 2008) | 2 lines Make doctests in re docs usable with sphinx' doctest. ................ r61743 | georg.brandl | 2008-03-22 13:59:37 +0100 (Sat, 22 Mar 2008) | 2 lines Make more doctests in pprint docs testable. ................ r61744 | georg.brandl | 2008-03-22 14:07:06 +0100 (Sat, 22 Mar 2008) | 2 lines No need to specify explicit "doctest_block" anymore. ................ r61753 | georg.brandl | 2008-03-22 21:08:43 +0100 (Sat, 22 Mar 2008) | 2 lines Fix-up syntax problems. ................ r61761 | georg.brandl | 2008-03-22 22:06:20 +0100 (Sat, 22 Mar 2008) | 4 lines Make collections' doctests executable. (The <BLANKLINE>s will be stripped from presentation output.) ................ r61765 | georg.brandl | 2008-03-22 22:21:57 +0100 (Sat, 22 Mar 2008) | 2 lines Test doctests in datetime docs. ................ r61766 | georg.brandl | 2008-03-22 22:26:44 +0100 (Sat, 22 Mar 2008) | 2 lines Test doctests in operator docs. ................ r61767 | georg.brandl | 2008-03-22 22:38:33 +0100 (Sat, 22 Mar 2008) | 2 lines Enable doctests in functions.rst. Already found two errors :) ................ r61769 | georg.brandl | 2008-03-22 23:04:10 +0100 (Sat, 22 Mar 2008) | 3 lines Enable doctest running for several other documents. We have now over 640 doctests that are run with "make doctest". ................ r61773 | raymond.hettinger | 2008-03-23 01:55:46 +0100 (Sun, 23 Mar 2008) | 1 line Simplify demo code. ................ r61776 | neal.norwitz | 2008-03-23 04:43:33 +0100 (Sun, 23 Mar 2008) | 7 lines Try to make this test a little more robust and not fail with: timeout (10.0025) is more than 2 seconds more than expected (0.001) I'm assuming this problem is caused by DNS lookup. This change does a DNS lookup of the hostname before trying to connect, so the time is not included. ................ r61777 | neal.norwitz | 2008-03-23 05:08:30 +0100 (Sun, 23 Mar 2008) | 1 line Speed up the test by avoiding socket timeouts. ................ r61778 | neal.norwitz | 2008-03-23 05:43:09 +0100 (Sun, 23 Mar 2008) | 1 line Skip the epoll test if epoll() does not work ................ r61780 | neal.norwitz | 2008-03-23 06:47:20 +0100 (Sun, 23 Mar 2008) | 1 line Suppress failure (to avoid a flaky test) if we cannot connect to svn.python.org ................ r61781 | neal.norwitz | 2008-03-23 07:13:25 +0100 (Sun, 23 Mar 2008) | 4 lines Move itertools before future_builtins since the latter depends on the former. From a clean build importing future_builtins would fail since itertools wasn't built yet. ................ r61782 | neal.norwitz | 2008-03-23 07:16:04 +0100 (Sun, 23 Mar 2008) | 1 line Try to prevent the alarm going off early in tearDown ................ r61783 | neal.norwitz | 2008-03-23 07:19:57 +0100 (Sun, 23 Mar 2008) | 4 lines Remove compiler warnings (on Alpha at least) about using chars as array subscripts. Using chars are dangerous b/c they are signed on some platforms and unsigned on others. ................ r61788 | georg.brandl | 2008-03-23 09:05:30 +0100 (Sun, 23 Mar 2008) | 2 lines Make the doctests presentation-friendlier. ................ r61793 | amaury.forgeotdarc | 2008-03-23 10:55:29 +0100 (Sun, 23 Mar 2008) | 4 lines #1477: ur'\U0010FFFF' raised in narrow unicode builds. Corrected the raw-unicode-escape codec to use UTF-16 surrogates in this case, just like the unicode-escape codec. ................ r61796 | raymond.hettinger | 2008-03-23 14:32:32 +0100 (Sun, 23 Mar 2008) | 1 line Issue 1681432: Add triangular distribution the random module. ................ r61807 | raymond.hettinger | 2008-03-23 20:37:53 +0100 (Sun, 23 Mar 2008) | 4 lines Adopt Nick's suggestion for useful default arguments. Clean-up floating point issues by adding true division and float constants. ................ r61813 | gregory.p.smith | 2008-03-23 22:04:43 +0100 (Sun, 23 Mar 2008) | 6 lines Fix gzip to deal with CRC's being signed values in Python 2.x properly and to read 32bit values as unsigned to start with rather than applying signedness fixups allover the place afterwards. This hopefully fixes the test_tarfile failure on the alpha/tru64 buildbot. ................
2008-03-23 18:54:12 -03:00
An example usage of the module:
2007-08-15 11:28:22 -03:00
>>> import base64
2010-10-17 08:36:28 -03:00
>>> encoded = base64.b64encode(b'data to be encoded')
2007-08-15 11:28:22 -03:00
>>> encoded
2009-01-18 06:43:58 -04:00
b'ZGF0YSB0byBiZSBlbmNvZGVk'
2007-08-15 11:28:22 -03:00
>>> data = base64.b64decode(encoded)
>>> data
2010-10-17 08:36:28 -03:00
b'data to be encoded'
2007-08-15 11:28:22 -03:00
.. _base64-security:
Security Considerations
-----------------------
A new security considerations section was added to :rfc:`4648` (section 12); it's
recommended to review the security section for any code deployed to production.
2007-08-15 11:28:22 -03:00
.. seealso::
Module :mod:`binascii`
Support module containing ASCII-to-binary and binary-to-ASCII conversions.
:rfc:`1521` - MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies
Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
base64 encoding.