mirror of https://github.com/python/cpython
Issue #22914: Update the Python 2/3 porting HOWTO to describe a more
automated process.
This commit is contained in:
parent
969175091c
commit
6b335196c5
|
@ -19,600 +19,359 @@ Porting Python 2 Code to Python 3
|
|||
If you would like to read one core Python developer's take on why Python 3
|
||||
came into existence, you can read Nick Coghlan's `Python 3 Q & A`_.
|
||||
|
||||
If you prefer to read a (free) book on porting a project to Python 3,
|
||||
consider reading `Porting to Python 3`_ by Lennart Regebro which should cover
|
||||
much of what is discussed in this HOWTO.
|
||||
|
||||
For help with porting, you can email the python-porting_ mailing list with
|
||||
questions.
|
||||
|
||||
The Short Version
|
||||
=================
|
||||
|
||||
* Decide what's the oldest version of Python 2 you want to support (if at all)
|
||||
* Make sure you have a thorough test suite and use continuous integration
|
||||
testing to make sure you stay compatible with the versions of Python you care
|
||||
about
|
||||
* If you have dependencies, check their Python 3 status using caniusepython3
|
||||
(`command-line tool <https://pypi.python.org/pypi/caniusepython3>`__,
|
||||
`web app <https://caniusepython3.com/>`__)
|
||||
|
||||
With that done, your options are:
|
||||
|
||||
* If you are dropping Python 2 support, use :ref:`2to3 <2to3-reference>` to port
|
||||
to Python 3
|
||||
|
||||
* If you are keeping Python 2 support, then start writing Python 2/3-compatible
|
||||
code starting **TODAY**
|
||||
|
||||
+ If you have dependencies that have not been ported, reach out to them to port
|
||||
their project while working to make your code compatible with Python 3 so
|
||||
you're ready when your dependencies are all ported
|
||||
+ If all your dependencies have been ported (or you have none), go ahead and
|
||||
port to Python 3
|
||||
|
||||
* If you are creating a new project that wants to have 2/3 compatibility,
|
||||
code in Python 3 and then backport to Python 2
|
||||
|
||||
|
||||
Before You Begin
|
||||
================
|
||||
|
||||
If your project is on the Cheeseshop_/PyPI_, make sure it has the proper
|
||||
`trove classifiers`_ to signify what versions of Python it **currently**
|
||||
supports. At minimum you should specify the major version(s), e.g.
|
||||
``Programming Language :: Python :: 2`` if your project currently only supports
|
||||
Python 2. It is preferrable that you be as specific as possible by listing every
|
||||
major/minor version of Python that you support, e.g. if your project supports
|
||||
Python 2.6 and 2.7, then you want the classifiers of::
|
||||
|
||||
Programming Language :: Python :: 2
|
||||
Programming Language :: Python :: 2.6
|
||||
Programming Language :: Python :: 2.7
|
||||
|
||||
Once your project supports Python 3 you will want to go back and add the
|
||||
appropriate classifiers for Python 3 as well. This is important as setting the
|
||||
``Programming Language :: Python :: 3`` classifier will lead to your project
|
||||
being listed under the `Python 3 Packages`_ section of PyPI.
|
||||
|
||||
Make sure you have a robust test suite. You need to
|
||||
make sure everything continues to work, just like when you support a new
|
||||
minor/feature release of Python. This means making sure your test suite is
|
||||
thorough and is ported properly between Python 2 & 3 (consider using coverage_
|
||||
to measure that you have effective test coverage). You will also most likely
|
||||
want to use something like tox_ to automate testing between all of your
|
||||
supported versions of Python. You will also want to **port your tests first** so
|
||||
that you can make sure that you detect breakage during the transition. Tests also
|
||||
tend to be simpler than the code they are testing so it gives you an idea of how
|
||||
easy it can be to port code.
|
||||
|
||||
Drop support for older Python versions if possible. Python 2.5
|
||||
introduced a lot of useful syntax and libraries which have become idiomatic
|
||||
in Python 3. Python 2.6 introduced future statements which makes
|
||||
compatibility much easier if you are going from Python 2 to 3.
|
||||
Python 2.7 continues the trend in the stdlib. Choose the newest version
|
||||
of Python which you believe can be your minimum support version
|
||||
and work from there.
|
||||
|
||||
Target the newest version of Python 3 that you can. Beyond just the usual
|
||||
bugfixes, compatibility has continued to improve between Python 2 and 3 as time
|
||||
has passed. E.g. Python 3.3 added back the ``u`` prefix for
|
||||
strings, making source-compatible Python code easier to write.
|
||||
|
||||
|
||||
Writing Source-Compatible Python 2/3 Code
|
||||
=========================================
|
||||
|
||||
Over the years the Python community has discovered that the easiest way to
|
||||
support both Python 2 and 3 in parallel is to write Python code that works in
|
||||
either version. While this might sound counter-intuitive at first, it actually
|
||||
is not difficult and typically only requires following some select
|
||||
(non-idiomatic) practices and using some key projects to help make bridging
|
||||
between Python 2 and 3 easier.
|
||||
|
||||
Projects to Consider
|
||||
--------------------
|
||||
|
||||
The lowest level library for supporting Python 2 & 3 simultaneously is six_.
|
||||
Reading through its documentation will give you an idea of where exactly the
|
||||
Python language changed between versions 2 & 3 and thus what you will want the
|
||||
library to help you continue to support.
|
||||
|
||||
To help automate porting your code over to using six, you can use
|
||||
modernize_. This project will attempt to rewrite your code to be as modern as
|
||||
possible while using six to smooth out any differences between Python 2 & 3.
|
||||
|
||||
If you want to write your compatible code to feel more like Python 3 there is
|
||||
the future_ project. It tries to provide backports of objects from Python 3 so
|
||||
that you can use them from Python 2-compatible code, e.g. replacing the
|
||||
``bytes`` type from Python 2 with the one from Python 3.
|
||||
It also provides a translation script like modernize (its translation code is
|
||||
actually partially based on it) to help start working with a pre-existing code
|
||||
base. It is also unique in that its translation script will also port Python 3
|
||||
code backwards as well as Python 2 code forwards.
|
||||
|
||||
|
||||
Tips & Tricks
|
||||
-------------
|
||||
|
||||
To help with writing source-compatible code using one of the projects mentioned
|
||||
in `Projects to Consider`_, consider following the below suggestions. Some of
|
||||
them are handled by the suggested projects, so if you do use one of them then
|
||||
read their documentation first to see which suggestions below will taken care of
|
||||
for you.
|
||||
|
||||
Support Python 2.7
|
||||
//////////////////
|
||||
|
||||
As a first step, make sure that your project is compatible with Python 2.7.
|
||||
This is just good to do as Python 2.7 is the last release of Python 2 and thus
|
||||
will be used for a rather long time. It also allows for use of the ``-3`` flag
|
||||
to Python to help discover places in your code where compatibility might be an
|
||||
issue (the ``-3`` flag is in Python 2.6 but Python 2.7 adds more warnings).
|
||||
|
||||
Try to Support Python 2.6 and Newer Only
|
||||
////////////////////////////////////////
|
||||
|
||||
While not possible for all projects, if you can support Python 2.6 and newer
|
||||
**only**, your life will be much easier. Various future statements, stdlib
|
||||
additions, etc. exist only in Python 2.6 and later which greatly assist in
|
||||
supporting Python 3. But if you project must keep support for Python 2.5 then
|
||||
it is still possible to simultaneously support Python 3.
|
||||
|
||||
Below are the benefits you gain if you only have to support Python 2.6 and
|
||||
newer. Some of these options are personal choice while others are
|
||||
**strongly** recommended (the ones that are more for personal choice are
|
||||
labeled as such). If you continue to support older versions of Python then you
|
||||
at least need to watch out for situations that these solutions fix and handle
|
||||
them appropriately (which is where library help from e.g. six_ comes in handy).
|
||||
|
||||
|
||||
``from __future__ import print_function``
|
||||
'''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
It will not only get you used to typing ``print()`` as a function instead of a
|
||||
statement, but it will also give you the various benefits the function has over
|
||||
the Python 2 statement (six_ provides a function if you support Python 2.5 or
|
||||
older).
|
||||
|
||||
|
||||
``from __future__ import unicode_literals``
|
||||
'''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
If you choose to use this future statement then all string literals in
|
||||
Python 2 will be assumed to be Unicode (as is already the case in Python 3).
|
||||
If you choose not to use this future statement then you should mark all of your
|
||||
text strings with a ``u`` prefix and only support Python 3.3 or newer. But you
|
||||
are **strongly** advised to do one or the other (six_ provides a function in
|
||||
case you don't want to use the future statement **and** you want to support
|
||||
Python 3.2 or older).
|
||||
|
||||
|
||||
Bytes/string literals
|
||||
'''''''''''''''''''''
|
||||
|
||||
This is a **very** important one. Prefix Python 2 strings that
|
||||
are meant to contain bytes with a ``b`` prefix to very clearly delineate
|
||||
what is and is not a Python 3 text string (six_ provides a function to use for
|
||||
Python 2.5 compatibility).
|
||||
|
||||
This point cannot be stressed enough: make sure you know what all of your string
|
||||
literals in Python 2 are meant to be in Python 3. Any string literal that
|
||||
should be treated as bytes should have the ``b`` prefix. Any string literal
|
||||
that should be Unicode/text in Python 2 should either have the ``u`` literal
|
||||
(supported, but ignored, in Python 3.3 and later) or you should have
|
||||
``from __future__ import unicode_literals`` at the top of the file. But the key
|
||||
point is you should know how Python 3 will treat every one one of your string
|
||||
literals and you should mark them as appropriate.
|
||||
|
||||
There are some differences between byte literals in Python 2 and those in
|
||||
Python 3 thanks to the bytes type just being an alias to ``str`` in Python 2.
|
||||
See the `Handle Common "Gotchas"`_ section for what to watch out for.
|
||||
|
||||
``from __future__ import absolute_import``
|
||||
''''''''''''''''''''''''''''''''''''''''''
|
||||
Discussed in more detail below, but you should use this future statement to
|
||||
prevent yourself from accidentally using implicit relative imports.
|
||||
|
||||
|
||||
Supporting Python 2.5 and Newer Only
|
||||
////////////////////////////////////
|
||||
|
||||
If you are supporting Python 2.5 and newer there are still some features of
|
||||
Python that you can utilize.
|
||||
|
||||
|
||||
``from __future__ import absolute_import``
|
||||
''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Implicit relative imports (e.g., importing ``spam.bacon`` from within
|
||||
``spam.eggs`` with the statement ``import bacon``) do not work in Python 3.
|
||||
This future statement moves away from that and allows the use of explicit
|
||||
relative imports (e.g., ``from . import bacon``).
|
||||
|
||||
In Python 2.5 you must use
|
||||
the __future__ statement to get to use explicit relative imports and prevent
|
||||
implicit ones. In Python 2.6 explicit relative imports are available without
|
||||
the statement, but you still want the __future__ statement to prevent implicit
|
||||
relative imports. In Python 2.7 the __future__ statement is not needed. In
|
||||
other words, unless you are only supporting Python 2.7 or a version earlier
|
||||
than Python 2.5, use this __future__ statement.
|
||||
|
||||
|
||||
Mark all Unicode strings with a ``u`` prefix
|
||||
'''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
While Python 2.6 has a ``__future__`` statement to automatically cause Python 2
|
||||
to treat all string literals as Unicode, Python 2.5 does not have that shortcut.
|
||||
This means you should go through and mark all string literals with a ``u``
|
||||
prefix to turn them explicitly into text strings where appropriate and only
|
||||
support Python 3.3 or newer. Otherwise use a project like six_ which provides a
|
||||
function to pass all text string literals through.
|
||||
|
||||
|
||||
Capturing the Currently Raised Exception
|
||||
''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
In Python 2.5 and earlier the syntax to access the current exception is::
|
||||
|
||||
try:
|
||||
raise Exception()
|
||||
except Exception, exc:
|
||||
# Current exception is 'exc'.
|
||||
pass
|
||||
|
||||
This syntax changed in Python 3 (and backported to Python 2.6 and later)
|
||||
to::
|
||||
|
||||
try:
|
||||
raise Exception()
|
||||
except Exception as exc:
|
||||
# Current exception is 'exc'.
|
||||
# In Python 3, 'exc' is restricted to the block; in Python 2.6/2.7 it will "leak".
|
||||
pass
|
||||
|
||||
Because of this syntax change you must change how you capture the current
|
||||
exception in Python 2.5 and earlier to::
|
||||
|
||||
try:
|
||||
raise Exception()
|
||||
except Exception:
|
||||
import sys
|
||||
exc = sys.exc_info()[1]
|
||||
# Current exception is 'exc'.
|
||||
pass
|
||||
|
||||
You can get more information about the raised exception from
|
||||
:func:`sys.exc_info` than simply the current exception instance, but you most
|
||||
likely don't need it.
|
||||
|
||||
.. note::
|
||||
In Python 3, the traceback is attached to the exception instance
|
||||
through the ``__traceback__`` attribute. If the instance is saved in
|
||||
a local variable that persists outside of the ``except`` block, the
|
||||
traceback will create a reference cycle with the current frame and its
|
||||
dictionary of local variables. This will delay reclaiming dead
|
||||
resources until the next cyclic :term:`garbage collection` pass.
|
||||
|
||||
In Python 2, this problem only occurs if you save the traceback itself
|
||||
(e.g. the third element of the tuple returned by :func:`sys.exc_info`)
|
||||
in a variable.
|
||||
|
||||
|
||||
Handle Common "Gotchas"
|
||||
///////////////////////
|
||||
|
||||
These are things to watch out for no matter what version of Python 2 you are
|
||||
supporting which are not syntactic considerations.
|
||||
|
||||
|
||||
``from __future__ import division``
|
||||
'''''''''''''''''''''''''''''''''''
|
||||
|
||||
While the exact same outcome can be had by using the ``-Qnew`` argument to
|
||||
Python, using this future statement lifts the requirement that your users use
|
||||
the flag to get the expected behavior of division in Python 3
|
||||
(e.g., ``1/2 == 0.5; 1//2 == 0``).
|
||||
|
||||
|
||||
|
||||
Specify when opening a file as binary
|
||||
'''''''''''''''''''''''''''''''''''''
|
||||
|
||||
The Short Explanation
|
||||
=====================
|
||||
|
||||
To make your project be single-source Python 2/3 compatible, the basic steps
|
||||
are:
|
||||
|
||||
#. Update your code to drop support for Python 2.5 or older (supporting only
|
||||
Python 2.7 is ideal)
|
||||
#. Make sure you have good test coverage (coverage.py_ can help)
|
||||
#. Learn the differences between Python 2 & 3
|
||||
#. Use Modernize_ or Futurize_ to update your code
|
||||
#. Use Pylint_ to help make sure you don't regress on your Python 3 support
|
||||
(if only supporting Python 2.7/3.4 or newer)
|
||||
#. Use caniusepython3_ to find out which of your dependencies are blocking your
|
||||
use of Python 3
|
||||
#. Once your dependencies are no longer blocking you, use continuous integration
|
||||
to make sure you stay compatible with Python 2 & 3 (tox_ can help test
|
||||
against multiple versions of Python)
|
||||
|
||||
If you are dropping support for Python 2 entirely, then after you learn the
|
||||
differences between Python 2 & 3 you can run 2to3_ over your code and skip the
|
||||
rest of the steps outlined above.
|
||||
|
||||
|
||||
Details
|
||||
=======
|
||||
|
||||
A key point about supporting Python 2 & 3 simultaneously is that you can start
|
||||
**today**! Even if your dependencies are not supporting Python 3 yet that does
|
||||
not mean you can't modernize your code **now** to support Python 3. Most changes
|
||||
required to support Python 3 lead to cleaner code using newer practices even in
|
||||
Python 2.
|
||||
|
||||
Another key point is that modernizing your Python 2 code to also support
|
||||
Python 3 is largely automated for you. While you might have to make some API
|
||||
decisions thanks to Python 3 clarifying text data versus binary data, the
|
||||
lower-level work is now mostly done for you and thus can at least benefit from
|
||||
the automated changes immediately.
|
||||
|
||||
Keep those key points in mind while you read on about the details of porting
|
||||
your code to support Python 2 & 3 simultaneously.
|
||||
|
||||
|
||||
Drop support for Python 2.5 and older (at least)
|
||||
------------------------------------------------
|
||||
|
||||
While you can make Python 2.5 work with Python 3, it is **much** easier if you
|
||||
only have to work with Python 2.6 or newer (and easier still if you only have
|
||||
to work with Python 2.7). If dropping Python 2.5 is not an option then the six_
|
||||
project can help you support Python 2.5 & 3 simultaneously. Do realize, though,
|
||||
that nearly all the projects listed in this HOWTO will not be available to you.
|
||||
|
||||
If you are able to only support Python 2.6 or newer, then the required changes
|
||||
to your code should continue to look and feel like idiomatic Python code. At
|
||||
worst you will have to use a function instead of a method in some instances or
|
||||
have to import a function instead of using a built-in one, but otherwise the
|
||||
overall transformation should not feel foreign to you.
|
||||
|
||||
But please aim for Python 2.7. Bugfixes for that version of Python will continue
|
||||
until 2020 while Python 2.6 is no longer supported. There are also some tools
|
||||
mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint_), and
|
||||
this will become more commonplace as time goes on.
|
||||
|
||||
Make sure you specify the proper version support in your ``setup.py`` file
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
In your ``setup.py`` file you should have the proper `trove classifier`_
|
||||
specifying what versions of Python you support. As your project does not support
|
||||
Python 3 yet you should at least have
|
||||
``Programming Language :: Python :: 2 :: Only`` specified. Ideally you should
|
||||
also specify each major/minor version of Python that you do support, e.g.
|
||||
``Programming Language :: Python :: 2.7``.
|
||||
|
||||
Have good test coverage
|
||||
-----------------------
|
||||
|
||||
Once you have your code supporting the oldest version of Python 2 you want it
|
||||
to, you will want to make sure your test suite has good coverage. A good rule of
|
||||
thumb is that if you want to be confident enough in your test suite that any
|
||||
failures that appear after having tools rewrite your code are actual bugs in the
|
||||
tools and not in your code. If you want a number to aim for, try to get over 80%
|
||||
coverage (and don't feel bad if you can't easily get past 90%). If you
|
||||
don't already have a tool to measure test coverage then coverage.py_ is
|
||||
recommended.
|
||||
|
||||
Learn the differences between Python 2 & 3
|
||||
-------------------------------------------
|
||||
|
||||
Once you have your code well-tested you are ready to begin porting your code to
|
||||
Python 3! But to fully understand how your code is going to change and what
|
||||
you want to look out for while you code, you will want to learn what changes
|
||||
Python 3 makes in terms of Python 2. Typically the two best ways of doing that
|
||||
is reading the `"What's New"`_ doc for each release of Python 3 and the
|
||||
`Porting to Python 3`_ book (which is free online).
|
||||
|
||||
|
||||
Update your code
|
||||
----------------
|
||||
|
||||
Once you feel like you know what is different in Python 3 compared to Python 2,
|
||||
it's time to update your code! You have a choice between two tools in porting
|
||||
your code automatically: Modernize_ and Futurize_. Which tool you choose will
|
||||
depend on how much like Python 3 you want your code to be. Futurize_ does its
|
||||
best to make Python 3 idioms and practices exist in Python 2, e.g. backporting
|
||||
the ``bytes`` type from Python 3 so that you have semantic parity between the
|
||||
major versions of Python. Modernize_,
|
||||
on the other hand, is more conservative and targets a Python 2/3 subset of
|
||||
Python, relying on six_ to help provide compatibility.
|
||||
|
||||
Regardless of which tool you choose, they will update your code to run under
|
||||
Python 3 while staying compatible with the version of Python 2 you started with.
|
||||
Depending on how conservative you want to be, you may want to run the tool over
|
||||
your test suite first and visually inspect the diff to make sure the
|
||||
transformation is accurate. After you have transformed your test suite and
|
||||
verified that all the tests still pass as expected, then you can transform your
|
||||
application code knowing that any tests which fail is a translation failure.
|
||||
|
||||
Unfortunately the tools can't automate everything to make your code work under
|
||||
Python 3 and so there are a handful of things you will need to update manually
|
||||
to get full Python 3 support (which of these steps are necessary vary between
|
||||
the tools). Read the documentation for the tool you choose to use to see what it
|
||||
fixes by default and what it can do optionally to know what will (not) be fixed
|
||||
for you and what you may have to fix on your own (e.g. using ``io.open()`` over
|
||||
the built-in ``open()`` function is off by default in Modernize). Luckily,
|
||||
though, there are only a couple of things to watch out for which can be
|
||||
considered large issues that may be hard to debug if not watched for.
|
||||
|
||||
Division
|
||||
++++++++
|
||||
|
||||
In Python 3, ``5 / 2 == 2.5`` and not ``2``; all division between ``int`` values
|
||||
result in a ``float``. This change has actually been planned since Python 2.2
|
||||
which was released in 2002. Since then users have been encouraged to add
|
||||
``from __future__ import division`` to any and all files which use the ``/`` and
|
||||
``//`` operators or to be running the interpreter with the ``-Q`` flag. If you
|
||||
have not been doing this then you will need to go through your code and do two
|
||||
things:
|
||||
|
||||
#. Add ``from __future__ import division`` to your files
|
||||
#. Update any division operator as necessary to either use ``//`` to use floor
|
||||
division or continue using ``/`` and expect a float
|
||||
|
||||
The reason that ``/`` isn't simply translated to ``//`` automatically is that if
|
||||
an object defines its own ``__div__`` method but not ``__floordiv__`` then your
|
||||
code would begin to fail.
|
||||
|
||||
Text versus binary data
|
||||
+++++++++++++++++++++++
|
||||
|
||||
In Python 2 you could use the ``str`` type for both text and binary data.
|
||||
Unfortunately this confluence of two different concepts could lead to brittle
|
||||
code which sometimes worked for either kind of data, sometimes not. It also
|
||||
could lead to confusing APIs if people didn't explicitly state that something
|
||||
that accepted ``str`` accepted either text or binary data instead of one
|
||||
specific type. This complicated the situation especially for anyone supporting
|
||||
multiple languages as APIs wouldn't bother explicitly supporting ``unicode``
|
||||
when they claimed text data support.
|
||||
|
||||
To make the distinction between text and binary data clearer and more
|
||||
pronounced, Python 3 did what most languages created in the age of the internet
|
||||
have done and made text and binary data distinct types that cannot blindly be
|
||||
mixed together (Python predates widespread access to the internet). For any code
|
||||
that only deals with text or only binary data, this separation doesn't pose an
|
||||
issue. But for code that has to deal with both, it does mean you might have to
|
||||
now care about when you are using text compared to binary data, which is why
|
||||
this cannot be entirely automated.
|
||||
|
||||
To start, you will need to decide which APIs take text and which take binary
|
||||
(it is **highly** recommended you don't design APIs that can take both due to
|
||||
the difficulty of keeping the code working; as stated earlier it is difficult to
|
||||
do well). In Python 2 this means making sure the APIs that take text can work
|
||||
with ``unicode`` in Python 2 and those that work with binary data work with the
|
||||
``bytes`` type from Python 3 and thus a subset of ``str`` in Python 2 (which the
|
||||
``bytes`` type in Python 2 is an alias for). Usually the biggest issue is
|
||||
realizing which methods exist for which types in Python 2 & 3 simultaneously
|
||||
(for text that's ``unicode`` in Python 2 and ``str`` in Python 3, for binary
|
||||
that's ``str``/``bytes`` in Python 2 and ``bytes`` in Python 3). The following
|
||||
table lists the **unique** methods of each data type across Python 2 & 3
|
||||
(e.g., the ``decode()`` method is usable on the equivalent binary data type in
|
||||
either Python 2 or 3, but it can't be used by the text data type consistently
|
||||
between Python 2 and 3 because ``str`` in Python 3 doesn't have the method).
|
||||
|
||||
======================== =====================
|
||||
**Text data** **Binary data**
|
||||
------------------------ ---------------------
|
||||
__mod__ (``%`` operator)
|
||||
------------------------ ---------------------
|
||||
\ decode
|
||||
------------------------ ---------------------
|
||||
encode
|
||||
------------------------ ---------------------
|
||||
format
|
||||
------------------------ ---------------------
|
||||
isdecimal
|
||||
------------------------ ---------------------
|
||||
isnumeric
|
||||
======================== =====================
|
||||
|
||||
Making the distinction easier to handle can be accomplished by encoding and
|
||||
decoding between binary data and text at the edge of your code. This means that
|
||||
when you receive text in binary data, you should immediately decode it. And if
|
||||
your code needs to send text as binary data then encode it as late as possible.
|
||||
This allows your code to work with only text internally and thus eliminates
|
||||
having to keep track of what type of data you are working with.
|
||||
|
||||
The next issue is making sure you know whether the string literals in your code
|
||||
represent text or binary data. At minimum you should add a ``b`` prefix to any
|
||||
literal that presents binary data. For text you should either use the
|
||||
``from __future__ import unicode_literals`` statement or add a ``u`` prefix to
|
||||
the text literal.
|
||||
|
||||
As part of this dichotomy you also need to be careful about opening files.
|
||||
Unless you have been working on Windows, there is a chance you have not always
|
||||
bothered to add the ``b`` mode when opening a binary file (e.g., ``rb`` for
|
||||
binary reading). Under Python 3, binary files and text files are clearly
|
||||
distinct and mutually incompatible; see the :mod:`io` module for details.
|
||||
Therefore, you **must** make a decision of whether a file will be used for
|
||||
binary access (allowing to read and/or write bytes data) or text access
|
||||
(allowing to read and/or write unicode data).
|
||||
|
||||
Text files
|
||||
''''''''''
|
||||
|
||||
Text files created using ``open()`` under Python 2 return byte strings,
|
||||
while under Python 3 they return unicode strings. Depending on your porting
|
||||
strategy, this can be an issue.
|
||||
|
||||
If you want text files to return unicode strings in Python 2, you have two
|
||||
possibilities:
|
||||
|
||||
* Under Python 2.6 and higher, use :func:`io.open`. Since :func:`io.open`
|
||||
is essentially the same function in both Python 2 and Python 3, it will
|
||||
help iron out any issues that might arise.
|
||||
|
||||
* If pre-2.6 compatibility is needed, then you should use :func:`codecs.open`
|
||||
instead. This will make sure that you get back unicode strings in Python 2.
|
||||
|
||||
Subclass ``object``
|
||||
'''''''''''''''''''
|
||||
|
||||
New-style classes have been around since Python 2.2. You need to make sure
|
||||
you are subclassing from ``object`` to avoid odd edge cases involving method
|
||||
resolution order, etc. This continues to be totally valid in Python 3 (although
|
||||
unneeded as all classes implicitly inherit from ``object``).
|
||||
|
||||
|
||||
Deal With the Bytes/String Dichotomy
|
||||
''''''''''''''''''''''''''''''''''''
|
||||
|
||||
One of the biggest issues people have when porting code to Python 3 is handling
|
||||
the bytes/string dichotomy. Because Python 2 allowed the ``str`` type to hold
|
||||
textual data, people have over the years been rather loose in their delineation
|
||||
of what ``str`` instances held text compared to bytes. In Python 3 you cannot
|
||||
be so care-free anymore and need to properly handle the difference. The key to
|
||||
handling this issue is to make sure that **every** string literal in your
|
||||
Python 2 code is either syntactically or functionally marked as either bytes or
|
||||
text data. After this is done you then need to make sure your APIs are designed
|
||||
to either handle a specific type or made to be properly polymorphic.
|
||||
|
||||
|
||||
Mark Up Python 2 String Literals
|
||||
********************************
|
||||
|
||||
First thing you must do is designate every single string literal in Python 2
|
||||
as either textual or bytes data. If you are only supporting Python 2.6 or
|
||||
newer, this can be accomplished by marking bytes literals with a ``b`` prefix
|
||||
and then designating textual data with a ``u`` prefix or using the
|
||||
``unicode_literals`` future statement.
|
||||
|
||||
If your project supports versions of Python predating 2.6, then you should use
|
||||
the six_ project and its ``b()`` function to denote bytes literals. For text
|
||||
literals you can either use six's ``u()`` function or use a ``u`` prefix.
|
||||
|
||||
|
||||
Decide what APIs Will Accept
|
||||
****************************
|
||||
|
||||
In Python 2 it was very easy to accidentally create an API that accepted both
|
||||
bytes and textual data. But in Python 3, thanks to the more strict handling of
|
||||
disparate types, this loose usage of bytes and text together tends to fail.
|
||||
|
||||
Take the dict ``{b'a': 'bytes', u'a': 'text'}`` in Python 2.6. It creates the
|
||||
dict ``{u'a': 'text'}`` since ``b'a' == u'a'``. But in Python 3 the equivalent
|
||||
dict creates ``{b'a': 'bytes', 'a': 'text'}``, i.e., no lost data. Similar
|
||||
issues can crop up when transitioning Python 2 code to Python 3.
|
||||
|
||||
This means you need to choose what an API is going to accept and create and
|
||||
consistently stick to that API in both Python 2 and 3.
|
||||
|
||||
|
||||
Bytes / Unicode Comparison
|
||||
**************************
|
||||
|
||||
In Python 3, mixing bytes and unicode is forbidden in most situations; it
|
||||
will raise a :class:`TypeError` where Python 2 would have attempted an implicit
|
||||
coercion between types. However, there is one case where it doesn't and
|
||||
it can be very misleading::
|
||||
|
||||
>>> b"" == ""
|
||||
False
|
||||
|
||||
This is because an equality comparison is required by the language to always
|
||||
succeed (and return ``False`` for incompatible types). However, this also
|
||||
means that code incorrectly ported to Python 3 can display buggy behaviour
|
||||
if such comparisons are silently executed. To detect such situations,
|
||||
Python 3 has a ``-b`` flag that will display a warning::
|
||||
|
||||
$ python3 -b
|
||||
>>> b"" == ""
|
||||
__main__:1: BytesWarning: Comparison between bytes and string
|
||||
False
|
||||
|
||||
To turn the warning into an exception, use the ``-bb`` flag instead::
|
||||
|
||||
$ python3 -bb
|
||||
>>> b"" == ""
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
BytesWarning: Comparison between bytes and string
|
||||
|
||||
|
||||
Indexing bytes objects
|
||||
''''''''''''''''''''''
|
||||
|
||||
Another potentially surprising change is the indexing behaviour of bytes
|
||||
objects in Python 3::
|
||||
|
||||
>>> b"xyz"[0]
|
||||
120
|
||||
|
||||
Indeed, Python 3 bytes objects (as well as :class:`bytearray` objects)
|
||||
are sequences of integers. But code converted from Python 2 will often
|
||||
assume that indexing a bytestring produces another bytestring, not an
|
||||
integer. To reconcile both behaviours, use slicing::
|
||||
|
||||
>>> b"xyz"[0:1]
|
||||
b'x'
|
||||
>>> n = 1
|
||||
>>> b"xyz"[n:n+1]
|
||||
b'y'
|
||||
|
||||
The only remaining gotcha is that an out-of-bounds slice returns an empty
|
||||
bytes object instead of raising ``IndexError``:
|
||||
|
||||
>>> b"xyz"[3]
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
IndexError: index out of range
|
||||
>>> b"xyz"[3:4]
|
||||
b''
|
||||
|
||||
|
||||
``__str__()``/``__unicode__()``
|
||||
'''''''''''''''''''''''''''''''
|
||||
|
||||
In Python 2, objects can specify both a string and unicode representation of
|
||||
themselves. In Python 3, though, there is only a string representation. This
|
||||
becomes an issue as people can inadvertently do things in their ``__str__()``
|
||||
methods which have unpredictable results (e.g., infinite recursion if you
|
||||
happen to use the ``unicode(self).encode('utf8')`` idiom as the body of your
|
||||
``__str__()`` method).
|
||||
|
||||
You can use a mixin class to work around this. This allows you to only define a
|
||||
``__unicode__()`` method for your class and let the mixin derive
|
||||
``__str__()`` for you (code from
|
||||
http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/)::
|
||||
|
||||
import sys
|
||||
|
||||
class UnicodeMixin(object):
|
||||
|
||||
"""Mixin class to handle defining the proper __str__/__unicode__
|
||||
methods in Python 2 or 3."""
|
||||
|
||||
if sys.version_info[0] >= 3: # Python 3
|
||||
def __str__(self):
|
||||
return self.__unicode__()
|
||||
else: # Python 2
|
||||
def __str__(self):
|
||||
return self.__unicode__().encode('utf8')
|
||||
|
||||
|
||||
class Spam(UnicodeMixin):
|
||||
|
||||
def __unicode__(self):
|
||||
return u'spam-spam-bacon-spam' # 2to3 will remove the 'u' prefix
|
||||
|
||||
|
||||
Don't Index on Exceptions
|
||||
'''''''''''''''''''''''''
|
||||
|
||||
In Python 2, the following worked::
|
||||
|
||||
>>> exc = Exception(1, 2, 3)
|
||||
>>> exc.args[1]
|
||||
2
|
||||
>>> exc[1] # Python 2 only!
|
||||
2
|
||||
|
||||
But in Python 3, indexing directly on an exception is an error. You need to
|
||||
make sure to only index on the :attr:`BaseException.args` attribute which is a
|
||||
sequence containing all arguments passed to the :meth:`__init__` method.
|
||||
|
||||
Even better is to use the documented attributes the exception provides.
|
||||
|
||||
|
||||
Don't use ``__getslice__`` & Friends
|
||||
''''''''''''''''''''''''''''''''''''
|
||||
|
||||
Been deprecated for a while, but Python 3 finally drops support for
|
||||
``__getslice__()``, etc. Move completely over to :meth:`__getitem__` and
|
||||
friends.
|
||||
|
||||
|
||||
Updating doctests
|
||||
'''''''''''''''''
|
||||
|
||||
Don't forget to make them Python 2/3 compatible as well. If you wrote a
|
||||
monolithic set of doctests (e.g., a single docstring containing all of your
|
||||
doctests), you should at least consider breaking the doctests up into smaller
|
||||
pieces to make it more manageable to fix. Otherwise it might very well be worth
|
||||
your time and effort to port your tests to :mod:`unittest`.
|
||||
|
||||
|
||||
Update ``map`` for imbalanced input sequences
|
||||
'''''''''''''''''''''''''''''''''''''''''''''
|
||||
|
||||
With Python 2, when ``map`` was given more than one input sequence it would pad
|
||||
the shorter sequences with ``None`` values, returning a sequence as long as the
|
||||
longest input sequence.
|
||||
|
||||
With Python 3, if the input sequences to ``map`` are of unequal length, ``map``
|
||||
will stop at the termination of the shortest of the sequences. For full
|
||||
compatibility with ``map`` from Python 2.x, wrap the sequence arguments in
|
||||
:func:`itertools.zip_longest`, e.g. ``map(func, *sequences)`` becomes
|
||||
``list(map(func, itertools.zip_longest(*sequences)))``.
|
||||
|
||||
Eliminate ``-3`` Warnings
|
||||
-------------------------
|
||||
|
||||
When you run your application's test suite, run it using the ``-3`` flag passed
|
||||
to Python. This will cause various warnings to be raised during execution about
|
||||
things that are semantic changes between Python 2 and 3. Try to eliminate those
|
||||
warnings to make your code even more portable to Python 3.
|
||||
|
||||
|
||||
Alternative Approaches
|
||||
======================
|
||||
|
||||
While supporting Python 2 & 3 simultaneously is typically the preferred choice
|
||||
by people so that they can continue to improve code and have it work for the
|
||||
most number of users, your life may be easier if you only have to support one
|
||||
major version of Python going forward.
|
||||
|
||||
Supporting Only Python 3 Going Forward From Python 2 Code
|
||||
---------------------------------------------------------
|
||||
|
||||
If you have Python 2 code but going forward only want to improve it as Python 3
|
||||
code, then you can use :ref:`2to3 <2to3-reference>` to translate your Python 2
|
||||
code to Python 3 code. This is only recommended, though, if your current
|
||||
version of your project is going into maintenance mode and you want all new
|
||||
features to be exclusive to Python 3.
|
||||
|
||||
|
||||
Backporting Python 3 code to Python 2
|
||||
-------------------------------------
|
||||
|
||||
If you have Python 3 code and have little interest in supporting Python 2 you
|
||||
can use 3to2_ to translate from Python 3 code to Python 2 code. This is only
|
||||
recommended if you don't plan to heavily support Python 2 users. Otherwise
|
||||
write your code for Python 3 and then backport as far back as you want. This
|
||||
is typically easier than going from Python 2 to 3 as you will have worked out
|
||||
any difficulties with e.g. bytes/strings, etc.
|
||||
|
||||
|
||||
Other Resources
|
||||
===============
|
||||
|
||||
The authors of the following blog posts, wiki pages, and books deserve special
|
||||
thanks for making public their tips for porting Python 2 code to Python 3 (and
|
||||
thus helping provide information for this document and its various revisions
|
||||
over the years):
|
||||
|
||||
* https://wiki.python.org/moin/PortingPythonToPy3k
|
||||
* http://python3porting.com/
|
||||
* http://docs.pythonsprints.com/python3_porting/py-porting.html
|
||||
* http://techspot.zzzeek.org/2011/01/24/zzzeek-s-guide-to-python-3-porting/
|
||||
* http://dabeaz.blogspot.com/2011/01/porting-py65-and-my-superboard-to.html
|
||||
* http://lucumr.pocoo.org/2011/1/22/forwards-compatible-python/
|
||||
* http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/
|
||||
* https://wiki.ubuntu.com/Python/3
|
||||
|
||||
If you feel there is something missing from this document that should be added,
|
||||
please email the python-porting_ mailing list.
|
||||
|
||||
|
||||
.. _3to2: https://pypi.python.org/pypi/3to2
|
||||
.. _Cheeseshop: PyPI_
|
||||
.. _coverage: https://pypi.python.org/pypi/coverage
|
||||
.. _future: http://python-future.org/
|
||||
.. _modernize: https://github.com/mitsuhiko/python-modernize
|
||||
binary access (allowing to read and/or write binary data) or text access
|
||||
(allowing to read and/or write text data). You should also use :func:`io.open`
|
||||
for opening files instead of the built-in :func:`open` function as the :mod:`io`
|
||||
module is consistent from Python 2 to 3 while the built-in :func:`open` function
|
||||
is not (in Python 3 it's actually :func:`io.open`).
|
||||
|
||||
Finally, the indexing of binary data requires careful handling (slicing does
|
||||
**not** require any special handling). In Python 2,
|
||||
``b'123'[1] == b'2'`` while in Python 3 ``b'123'[1] == 50``. Because binary data
|
||||
is simply a collection of binary numbers, Python 3 returns the integer value for
|
||||
the byte you index on. But in Python 2 because ``bytes == str``, indexing
|
||||
returns a one-item slice of bytes. The six_ project has a function
|
||||
named ``six.indexbytes()`` which will return an integer like in Python 3:
|
||||
``six.indexbytes(b'123', 1)``.
|
||||
|
||||
To summarize:
|
||||
|
||||
#. Decide which of your APIs take text and which take binary data
|
||||
#. Make sure that your code that works with text also works with ``unicode`` and
|
||||
code for binary data works with ``bytes`` in Python 2 (see the table above
|
||||
for what methods you cannot use for each type)
|
||||
#. Mark all binary literals with a ``b`` prefix, use a ``u`` prefix or
|
||||
:mod:`__future__` import statement for text literals
|
||||
#. Decode binary data to text as soon as possible, encode text as binary data as
|
||||
late as possible
|
||||
#. Open files using :func:`io.open` and make sure to specify the ``b`` mode when
|
||||
appropriate
|
||||
#. Be careful when indexing binary data
|
||||
|
||||
Prevent compatibility regressions
|
||||
---------------------------------
|
||||
|
||||
Once you have fully translated your code to be compatible with Python 3, you
|
||||
will want to make sure your code doesn't regress and stop working under
|
||||
Python 3. This is especially true if you have a dependency which is blocking you
|
||||
from actually running under Python 3 at the moment.
|
||||
|
||||
To help with staying compatible, any new modules you create should have
|
||||
at least the following block of code at the top of it::
|
||||
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_statement
|
||||
from __future__ import unicode_literals
|
||||
|
||||
You can also run Python 2 with the ``-3`` flag to be warned about various
|
||||
compatibility issues your code triggers during execution. If you turn warnings
|
||||
into errors with ``-Werror`` then you can make sure that you don't accidentally
|
||||
miss a warning.
|
||||
|
||||
|
||||
You can also use the Pylint_ project and its ``--py3k`` flag to lint your code
|
||||
to receive warnings when your code begins to deviate from Python 3
|
||||
compatibility. This also prevents you from having to run Modernize_ or Futurize_
|
||||
over your code regularly to catch compatibility regressions. This does require
|
||||
you only support Python 2.7 and Python 3.4 or newer as that is Pylint's
|
||||
minimum Python version support.
|
||||
|
||||
|
||||
Check which dependencies block your transition
|
||||
----------------------------------------------
|
||||
|
||||
**After** you have made your code compatible with Python 3 you should begin to
|
||||
care about whether your dependencies have also been ported. The caniusepython3_
|
||||
project was created to help you determine which projects
|
||||
-- directly or indirectly -- are blocking you from supporting Python 3. There
|
||||
is both a command-line tool as well as a web interface at
|
||||
https://caniusepython3.com .
|
||||
|
||||
The project also provides code which you can integrate into your test suite so
|
||||
that you will have a failing test when you no longer have dependencies blocking
|
||||
you from using Python 3. This allows you to avoid having to manually check your
|
||||
dependencies and to be notified quickly when you can start running on Python 3.
|
||||
|
||||
Update your ``setup.py`` file to denote Python 3 compatibility
|
||||
--------------------------------------------------------------
|
||||
|
||||
Once your code works under Python 3, you should update the classifiers in
|
||||
your ``setup.py`` to contain ``Programming Language :: Python :: 3`` and to not
|
||||
specify sole Python 2 support. This will tell
|
||||
anyone using your code that you support Python 2 **and** 3. Ideally you will
|
||||
also want to add classifiers for each major/minor version of Python you now
|
||||
support.
|
||||
|
||||
Use continuous integration to stay compatible
|
||||
---------------------------------------------
|
||||
|
||||
Once you are able to fully run under Python 3 you will want to make sure your
|
||||
code always works under both Python 2 & 3. Probably the best tool for running
|
||||
your tests under multiple Python interpreters is tox_. You can then integrate
|
||||
tox with your continuous integration system so that you never accidentally break
|
||||
Python 2 or 3 support.
|
||||
|
||||
You may also want to use use the ``-bb`` flag with the Python 3 interpreter to
|
||||
trigger an exception when you are comparing bytes to strings. Usually it's
|
||||
simply ``False``, but if you made a mistake in your separation of text/binary
|
||||
data handling you may be accidentally comparing text and binary data. This flag
|
||||
will raise an exception when that occurs to help track down such cases.
|
||||
|
||||
And that's mostly it! At this point your code base is compatible with both
|
||||
Python 2 and 3 simultaneously. Your testing will also be set up so that you
|
||||
don't accidentally break Python 2 or 3 compatibility regardless of which version
|
||||
you typically run your tests under while developing.
|
||||
|
||||
|
||||
Dropping Python 2 support completely
|
||||
====================================
|
||||
|
||||
If you are able to fully drop support for Python 2, then the steps required
|
||||
to transition to Python 3 simplify greatly.
|
||||
|
||||
#. Update your code to only support Python 2.7
|
||||
#. Make sure you have good test coverage (coverage.py_ can help)
|
||||
#. Learn the differences between Python 2 & 3
|
||||
#. Use 2to3_ to rewrite your code to run only under Python 3
|
||||
|
||||
After this your code will be fully Python 3 compliant but in a way that is not
|
||||
supported by Python 2. You should also update the classifiers in your
|
||||
``setup.py`` to contain ``Programming Language :: Python :: 3 :: Only``.
|
||||
|
||||
|
||||
.. _2to3: https://docs.python.org/3/library/2to3.html
|
||||
.. _caniusepython3:
|
||||
.. _coverage.py: https://pypi.python.org/pypi/coverage
|
||||
.. _Futurize: http://python-future.org/automatic_conversion.html
|
||||
.. _Modernize:
|
||||
.. _Porting to Python 3: http://python3porting.com/
|
||||
.. _PyPI: https://pypi.python.org/pypi
|
||||
.. _Python 3 Packages: https://pypi.python.org/pypi?:action=browse&c=533&show=all
|
||||
.. _Pylint: https://pypi.python.org/pypi/pylint
|
||||
.. _Python 3 Q & A: http://ncoghlan-devs-python-notes.readthedocs.org/en/latest/python3/questions_and_answers.html
|
||||
|
||||
.. _python-future: http://python-future.org/
|
||||
.. _python-porting: https://mail.python.org/mailman/listinfo/python-porting
|
||||
.. _six: https://pypi.python.org/pypi/six
|
||||
.. _tox: https://pypi.python.org/pypi/tox
|
||||
.. _trove classifiers: https://pypi.python.org/pypi?%3Aaction=list_classifiers
|
||||
.. _trove classifier: https://pypi.python.org/pypi?%3Aaction=list_classifiers
|
||||
.. _"What's New": https://docs.python.org/3/whatsnew/index.html
|
||||
|
|
Loading…
Reference in New Issue