cpython/Doc/howto/pyporting.rst

.. _pyporting-howto:

*************************************
How to port Python 2 Code to Python 3
*************************************

:author: Brett Cannon

.. topic:: Abstract

   Python 2 reached its official end-of-life at the start of 2020. This means
   that no new bug reports, fixes, or changes will be made to Python 2 - it's
   no longer supported.

   This guide is intended to provide you with a path to Python 3 for your
   code, that includes compatibility with Python 2 as a first step.

   If you are looking to port an extension module instead of pure Python code,
   please see :ref:`cporting-howto`.

   The archived python-porting_ mailing list may contain some useful guidance.


The Short Explanation
=====================

To achieve Python 2/3 compatibility in a single code base, the basic steps
are:

#. Only worry about supporting Python 2.7
#. Make sure you have good test coverage (coverage.py_ can help;
   ``python -m pip install coverage``)
#. Learn the differences between Python 2 and 3
#. Use Futurize_ (or Modernize_) to update your code (e.g. ``python -m pip install future``)
#. Use Pylint_ to help make sure you don't regress on your Python 3 support
   (``python -m pip install pylint``)
#. Use caniusepython3_ to find out which of your dependencies are blocking your
   use of Python 3 (``python -m pip install caniusepython3``)
#. Once your dependencies are no longer blocking you, use continuous integration
   to make sure you stay compatible with Python 2 and 3 (tox_ can help test
   against multiple versions of Python; ``python -m pip install tox``)
#. Consider using optional :term:`static type checking <static type checker>`
   to make sure your type usage
   works in both Python 2 and 3 (e.g. use mypy_ to check your typing under both
   Python 2 and Python 3; ``python -m pip install mypy``).

.. note::

   Note: Using ``python -m pip install`` guarantees that the ``pip`` you invoke
   is the one installed for the Python currently in use, whether it be
   a system-wide ``pip`` or one installed within a
   :ref:`virtual environment <tut-venv>`.

Details
=======

Even if other factors - say, dependencies over which you have no control -
still require you to support Python 2, that does not prevent you taking the
step of including Python 3 support.

Most changes required to support Python 3 lead to cleaner code using newer
practices even in Python 2 code.


Different versions of Python 2
------------------------------

Ideally, your code should be compatible with Python 2.7, which was the
last supported version of Python 2.

Some of the tools mentioned in this guide will not work with Python 2.6.

If absolutely necessary, the six_ project can help you support Python 2.5 and
3 simultaneously. Do realize, though, that nearly all the projects listed in
this guide will not be available to you.

If you are able to skip Python 2.5 and older, the required changes to your
code will be minimal. At worst you will have to use a function instead of a
method in some instances or have to import a function instead of using a
built-in one.


Make sure you specify the proper version support in your ``setup.py`` file
--------------------------------------------------------------------------

In your ``setup.py`` file you should have the proper `trove classifier`_
specifying what versions of Python you support. As your project does not support
Python 3 yet you should at least have
``Programming Language :: Python :: 2 :: Only`` specified. Ideally you should
also specify each major/minor version of Python that you do support, e.g.
``Programming Language :: Python :: 2.7``.


Have good test coverage
-----------------------

Once you have your code supporting the oldest version of Python 2 you want it
to, you will want to make sure your test suite has good coverage. A good rule of
thumb is that if you want to be confident enough in your test suite that any
failures that appear after having tools rewrite your code are actual bugs in the
tools and not in your code. If you want a number to aim for, try to get over 80%
coverage (and don't feel bad if you find it hard to get better than 90%
coverage). If you don't already have a tool to measure test coverage then
coverage.py_ is recommended.


Be aware of the differences between Python 2 and 3
--------------------------------------------------

Once you have your code well-tested you are ready to begin porting your code to
Python 3! But to fully understand how your code is going to change and what
you want to look out for while you code, you will want to learn what changes
Python 3 makes in terms of Python 2.

Some resources for understanding the differences and their implications for you
code:

* the :ref:`"What's New" <whatsnew-index>` doc for each release of Python 3
* the `Porting to Python 3`_ book (which is free online)
* the handy `cheat sheet`_ from the Python-Future project.


Update your code
----------------

There are tools available that can port your code automatically.

Futurize_ does its best to make Python 3 idioms and practices exist in Python
2, e.g. backporting the ``bytes`` type from Python 3 so that you have
semantic parity between the major versions of Python. This is the better
approach for most cases.

Modernize_, on the other hand, is more conservative and targets a Python 2/3
subset of Python, directly relying on six_ to help provide compatibility.

A good approach is to run the tool over your test suite first and visually
inspect the diff to make sure the transformation is accurate. After you have
transformed your test suite and verified that all the tests still pass as
expected, then you can transform your application code knowing that any tests
which fail is a translation failure.

Unfortunately the tools can't automate everything to make your code work under
Python 3, and you will also need to read the tools' documentation in case some
options you need are turned off by default.

Key issues to be aware of and check for:

Division
++++++++

In Python 3, ``5 / 2 == 2.5`` and not ``2`` as it was in Python 2; all
division between ``int`` values result in a ``float``. This change has
actually been planned since Python 2.2 which was released in 2002. Since then
users have been encouraged to add ``from __future__ import division`` to any
and all files which use the ``/`` and ``//`` operators or to be running the
interpreter with the ``-Q`` flag. If you have not been doing this then you
will need to go through your code and do two things:

#. Add ``from __future__ import division`` to your files
#. Update any division operator as necessary to either use ``//`` to use floor
   division or continue using ``/`` and expect a float

The reason that ``/`` isn't simply translated to ``//`` automatically is that if
an object defines a ``__truediv__`` method but not ``__floordiv__`` then your
code would begin to fail (e.g. a user-defined class that uses ``/`` to
signify some operation but not ``//`` for the same thing or at all).


Text versus binary data
+++++++++++++++++++++++

In Python 2 you could use the ``str`` type for both text and binary data.
Unfortunately this confluence of two different concepts could lead to brittle
code which sometimes worked for either kind of data, sometimes not. It also
could lead to confusing APIs if people didn't explicitly state that something
that accepted ``str`` accepted either text or binary data instead of one
specific type. This complicated the situation especially for anyone supporting
multiple languages as APIs wouldn't bother explicitly supporting ``unicode``
when they claimed text data support.

Python 3 made text and binary data distinct types that cannot simply be mixed
together. For any code that deals only with text or only binary data, this
separation doesn't pose an issue. But for code that has to deal with both, it
does mean you might have to now care about when you are using text compared
to binary data, which is why this cannot be entirely automated.

Decide which APIs take text and which take binary (it is **highly** recommended
you don't design APIs that can take both due to the difficulty of keeping the
code working; as stated earlier it is difficult to do well). In Python 2 this
means making sure the APIs that take text can work with ``unicode`` and those
that work with binary data work with the ``bytes`` type from Python 3
(which is a subset of ``str`` in Python 2 and acts as an alias for ``bytes``
type in Python 2). Usually the biggest issue is realizing which methods exist
on which types in Python 2 and 3 simultaneously (for text that's ``unicode``
in Python 2 and ``str`` in Python 3, for binary that's ``str``/``bytes`` in
Python 2 and ``bytes`` in Python 3).

The following table lists the **unique** methods of each data type across
Python 2 and 3 (e.g., the ``decode()`` method is usable on the equivalent binary
data type in either Python 2 or 3, but it can't be used by the textual data
type consistently between Python 2 and 3 because ``str`` in Python 3 doesn't
have the method). Do note that as of Python 3.5 the ``__mod__`` method was
added to the bytes type.

======================== =====================
**Text data**            **Binary data**
------------------------ ---------------------
\                        decode
------------------------ ---------------------
encode
------------------------ ---------------------
format
------------------------ ---------------------
isdecimal
------------------------ ---------------------
isnumeric
======================== =====================

Making the distinction easier to handle can be accomplished by encoding and
decoding between binary data and text at the edge of your code. This means that
when you receive text in binary data, you should immediately decode it. And if
your code needs to send text as binary data then encode it as late as possible.
This allows your code to work with only text internally and thus eliminates
having to keep track of what type of data you are working with.

The next issue is making sure you know whether the string literals in your code
represent text or binary data. You should add a ``b`` prefix to any
literal that presents binary data. For text you should add a ``u`` prefix to
the text literal. (There is a :mod:`__future__` import to force all unspecified
literals to be Unicode, but usage has shown it isn't as effective as adding a
``b`` or ``u`` prefix to all literals explicitly)

You also need to be careful about opening files. Possibly you have not always
bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for
binary reading).  Under Python 3, binary files and text files are clearly
distinct and mutually incompatible; see the :mod:`io` module for details.
Therefore, you **must** make a decision of whether a file will be used for
binary access (allowing binary data to be read and/or written) or textual access
(allowing text data to be read and/or written). You should also use :func:`io.open`
for opening files instead of the built-in :func:`open` function as the :mod:`io`
module is consistent from Python 2 to 3 while the built-in :func:`open` function
is not (in Python 3 it's actually :func:`io.open`). Do not bother with the
outdated practice of using :func:`codecs.open` as that's only necessary for
keeping compatibility with Python 2.5.

The constructors of both ``str`` and ``bytes`` have different semantics for the
same arguments between Python 2 and 3. Passing an integer to ``bytes`` in Python 2
will give you the string representation of the integer: ``bytes(3) == '3'``.
But in Python 3, an integer argument to ``bytes`` will give you a bytes object
as long as the integer specified, filled with null bytes:
``bytes(3) == b'\x00\x00\x00'``. A similar worry is necessary when passing a
bytes object to ``str``. In Python 2 you just get the bytes object back:
``str(b'3') == b'3'``. But in Python 3 you get the string representation of the
bytes object: ``str(b'3') == "b'3'"``.

Finally, the indexing of binary data requires careful handling (slicing does
**not** require any special handling). In Python 2,
``b'123'[1] == b'2'`` while in Python 3 ``b'123'[1] == 50``. Because binary data
is simply a collection of binary numbers, Python 3 returns the integer value for
the byte you index on. But in Python 2 because ``bytes == str``, indexing
returns a one-item slice of bytes. The six_ project has a function
named ``six.indexbytes()`` which will return an integer like in Python 3:
``six.indexbytes(b'123', 1)``.

To summarize:

#. Decide which of your APIs take text and which take binary data
#. Make sure that your code that works with text also works with ``unicode`` and
   code for binary data works with ``bytes`` in Python 2 (see the table above
   for what methods you cannot use for each type)
#. Mark all binary literals with a ``b`` prefix, textual literals with a ``u``
   prefix
#. Decode binary data to text as soon as possible, encode text as binary data as
   late as possible
#. Open files using :func:`io.open` and make sure to specify the ``b`` mode when
   appropriate
#. Be careful when indexing into binary data


Use feature detection instead of version detection
++++++++++++++++++++++++++++++++++++++++++++++++++

Inevitably you will have code that has to choose what to do based on what
version of Python is running. The best way to do this is with feature detection
of whether the version of Python you're running under supports what you need.
If for some reason that doesn't work then you should make the version check be
against Python 2 and not Python 3. To help explain this, let's look at an
example.

Let's pretend that you need access to a feature of :mod:`importlib` that
is available in Python's standard library since Python 3.3 and available for
Python 2 through importlib2_ on PyPI. You might be tempted to write code to
access e.g. the :mod:`importlib.abc` module by doing the following::

  import sys

  if sys.version_info[0] == 3:
      from importlib import abc
  else:
      from importlib2 import abc

The problem with this code is what happens when Python 4 comes out? It would
be better to treat Python 2 as the exceptional case instead of Python 3 and
assume that future Python versions will be more compatible with Python 3 than
Python 2::

  import sys

  if sys.version_info[0] > 2:
      from importlib import abc
  else:
      from importlib2 import abc

The best solution, though, is to do no version detection at all and instead rely
on feature detection. That avoids any potential issues of getting the version
detection wrong and helps keep you future-compatible::

  try:
      from importlib import abc
  except ImportError:
      from importlib2 import abc


Prevent compatibility regressions
---------------------------------

Once you have fully translated your code to be compatible with Python 3, you
will want to make sure your code doesn't regress and stop working under
Python 3. This is especially true if you have a dependency which is blocking you
from actually running under Python 3 at the moment.

To help with staying compatible, any new modules you create should have
at least the following block of code at the top of it::

    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function

You can also run Python 2 with the ``-3`` flag to be warned about various
compatibility issues your code triggers during execution. If you turn warnings
into errors with ``-Werror`` then you can make sure that you don't accidentally
miss a warning.

You can also use the Pylint_ project and its ``--py3k`` flag to lint your code
to receive warnings when your code begins to deviate from Python 3
compatibility. This also prevents you from having to run Modernize_ or Futurize_
over your code regularly to catch compatibility regressions. This does require
you only support Python 2.7 and Python 3.4 or newer as that is Pylint's
minimum Python version support.


Check which dependencies block your transition
----------------------------------------------

**After** you have made your code compatible with Python 3 you should begin to
care about whether your dependencies have also been ported. The caniusepython3_
project was created to help you determine which projects
-- directly or indirectly -- are blocking you from supporting Python 3. There
is both a command-line tool as well as a web interface at
https://caniusepython3.com.

The project also provides code which you can integrate into your test suite so
that you will have a failing test when you no longer have dependencies blocking
you from using Python 3. This allows you to avoid having to manually check your
dependencies and to be notified quickly when you can start running on Python 3.


Update your ``setup.py`` file to denote Python 3 compatibility
--------------------------------------------------------------

Once your code works under Python 3, you should update the classifiers in
your ``setup.py`` to contain ``Programming Language :: Python :: 3`` and to not
specify sole Python 2 support. This will tell anyone using your code that you
support Python 2 **and** 3. Ideally you will also want to add classifiers for
each major/minor version of Python you now support.


Use continuous integration to stay compatible
---------------------------------------------

Once you are able to fully run under Python 3 you will want to make sure your
code always works under both Python 2 and 3. Probably the best tool for running
your tests under multiple Python interpreters is tox_. You can then integrate
tox with your continuous integration system so that you never accidentally break
Python 2 or 3 support.

You may also want to use the ``-bb`` flag with the Python 3 interpreter to
trigger an exception when you are comparing bytes to strings or bytes to an int
(the latter is available starting in Python 3.5). By default type-differing
comparisons simply return ``False``, but if you made a mistake in your
separation of text/binary data handling or indexing on bytes you wouldn't easily
find the mistake. This flag will raise an exception when these kinds of
comparisons occur, making the mistake much easier to track down.


Consider using optional static type checking
--------------------------------------------

Another way to help port your code is to use a :term:`static type checker` like
mypy_ or pytype_ on your code. These tools can be used to analyze your code as
if it's being run under Python 2, then you can run the tool a second time as if
your code is running under Python 3. By running a static type checker twice like
this you can discover if you're e.g. misusing binary data type in one version
of Python compared to another. If you add optional type hints to your code you
can also explicitly state whether your APIs use textual or binary data, helping
to make sure everything functions as expected in both versions of Python.


.. _caniusepython3: https://pypi.org/project/caniusepython3
.. _cheat sheet: https://python-future.org/compatible_idioms.html
.. _coverage.py: https://pypi.org/project/coverage
.. _Futurize: https://python-future.org/automatic_conversion.html
.. _importlib2: https://pypi.org/project/importlib2
.. _Modernize: https://python-modernize.readthedocs.io/
.. _mypy: https://mypy-lang.org/
.. _Porting to Python 3: http://python3porting.com/
.. _Pylint: https://pypi.org/project/pylint

.. _Python 3 Q & A: https://ncoghlan-devs-python-notes.readthedocs.io/en/latest/python3/questions_and_answers.html

.. _pytype: https://github.com/google/pytype
.. _python-future: https://python-future.org/
.. _python-porting: https://mail.python.org/pipermail/python-porting/
.. _six: https://pypi.org/project/six
.. _tox: https://pypi.org/project/tox
.. _trove classifier: https://pypi.org/classifiers

.. _Why Python 3 exists: https://snarky.ca/why-python-3-exists