Issue #8188: Introduce a new scheme for computing hashes of numbers
(instances of int, float, complex, decimal.Decimal and fractions.Fraction) that makes it easy to maintain the invariant that hash(x) == hash(y) whenever x and y have equal value.
This commit is contained in:
parent
03721133a6
commit
dc787d2055
|
@ -595,6 +595,109 @@ hexadecimal string representing the same number::
|
|||
'0x1.d380000000000p+11'
|
||||
|
||||
|
||||
.. _numeric-hash:
|
||||
|
||||
Hashing of numeric types
|
||||
------------------------
|
||||
|
||||
For numbers ``x`` and ``y``, possibly of different types, it's a requirement
|
||||
that ``hash(x) == hash(y)`` whenever ``x == y`` (see the :meth:`__hash__`
|
||||
method documentation for more details). For ease of implementation and
|
||||
efficiency across a variety of numeric types (including :class:`int`,
|
||||
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`)
|
||||
Python's hash for numeric types is based on a single mathematical function
|
||||
that's defined for any rational number, and hence applies to all instances of
|
||||
:class:`int` and :class:`fraction.Fraction`, and all finite instances of
|
||||
:class:`float` and :class:`decimal.Decimal`. Essentially, this function is
|
||||
given by reduction modulo ``P`` for a fixed prime ``P``. The value of ``P`` is
|
||||
made available to Python as the :attr:`modulus` attribute of
|
||||
:data:`sys.hash_info`.
|
||||
|
||||
.. impl-detail::
|
||||
|
||||
Currently, the prime used is ``P = 2**31 - 1`` on machines with 32-bit C
|
||||
longs and ``P = 2**61 - 1`` on machines with 64-bit C longs.
|
||||
|
||||
Here are the rules in detail:
|
||||
|
||||
- If ``x = m / n`` is a nonnegative rational number and ``n`` is not divisible
|
||||
by ``P``, define ``hash(x)`` as ``m * invmod(n, P) % P``, where ``invmod(n,
|
||||
P)`` gives the inverse of ``n`` modulo ``P``.
|
||||
|
||||
- If ``x = m / n`` is a nonnegative rational number and ``n`` is
|
||||
divisible by ``P`` (but ``m`` is not) then ``n`` has no inverse
|
||||
modulo ``P`` and the rule above doesn't apply; in this case define
|
||||
``hash(x)`` to be the constant value ``sys.hash_info.inf``.
|
||||
|
||||
- If ``x = m / n`` is a negative rational number define ``hash(x)``
|
||||
as ``-hash(-x)``. If the resulting hash is ``-1``, replace it with
|
||||
``-2``.
|
||||
|
||||
- The particular values ``sys.hash_info.inf``, ``-sys.hash_info.inf``
|
||||
and ``sys.hash_info.nan`` are used as hash values for positive
|
||||
infinity, negative infinity, or nans (respectively). (All hashable
|
||||
nans have the same hash value.)
|
||||
|
||||
- For a :class:`complex` number ``z``, the hash values of the real
|
||||
and imaginary parts are combined by computing ``hash(z.real) +
|
||||
sys.hash_info.imag * hash(z.imag)``, reduced modulo
|
||||
``2**sys.hash_info.width`` so that it lies in
|
||||
``range(-2**(sys.hash_info.width - 1), 2**(sys.hash_info.width -
|
||||
1))``. Again, if the result is ``-1``, it's replaced with ``-2``.
|
||||
|
||||
|
||||
To clarify the above rules, here's some example Python code,
|
||||
equivalent to the builtin hash, for computing the hash of a rational
|
||||
number, :class:`float`, or :class:`complex`::
|
||||
|
||||
|
||||
import sys, math
|
||||
|
||||
def hash_fraction(m, n):
|
||||
"""Compute the hash of a rational number m / n.
|
||||
|
||||
Assumes m and n are integers, with n positive.
|
||||
Equivalent to hash(fractions.Fraction(m, n)).
|
||||
|
||||
"""
|
||||
P = sys.hash_info.modulus
|
||||
# Remove common factors of P. (Unnecessary if m and n already coprime.)
|
||||
while m % P == n % P == 0:
|
||||
m, n = m // P, n // P
|
||||
|
||||
if n % P == 0:
|
||||
hash_ = sys.hash_info.inf
|
||||
else:
|
||||
# Fermat's Little Theorem: pow(n, P-1, P) is 1, so
|
||||
# pow(n, P-2, P) gives the inverse of n modulo P.
|
||||
hash_ = (abs(m) % P) * pow(n, P - 2, P) % P
|
||||
if m < 0:
|
||||
hash_ = -hash_
|
||||
if hash_ == -1:
|
||||
hash_ = -2
|
||||
return hash_
|
||||
|
||||
def hash_float(x):
|
||||
"""Compute the hash of a float x."""
|
||||
|
||||
if math.isnan(x):
|
||||
return sys.hash_info.nan
|
||||
elif math.isinf(x):
|
||||
return sys.hash_info.inf if x > 0 else -sys.hash_info.inf
|
||||
else:
|
||||
return hash_fraction(*x.as_integer_ratio())
|
||||
|
||||
def hash_complex(z):
|
||||
"""Compute the hash of a complex number z."""
|
||||
|
||||
hash_ = hash_float(z.real) + sys.hash_info.imag * hash_float(z.imag)
|
||||
# do a signed reduction modulo 2**sys.hash_info.width
|
||||
M = 2**(sys.hash_info.width - 1)
|
||||
hash_ = (hash_ & (M - 1)) - (hash & M)
|
||||
if hash_ == -1:
|
||||
hash_ == -2
|
||||
return hash_
|
||||
|
||||
.. _typeiter:
|
||||
|
||||
Iterator Types
|
||||
|
|
|
@ -446,6 +446,30 @@ always available.
|
|||
Changed to a named tuple and added *service_pack_minor*,
|
||||
*service_pack_major*, *suite_mask*, and *product_type*.
|
||||
|
||||
|
||||
.. data:: hash_info
|
||||
|
||||
A structseq giving parameters of the numeric hash implementation. For
|
||||
more details about hashing of numeric types, see :ref:`numeric-hash`.
|
||||
|
||||
+---------------------+--------------------------------------------------+
|
||||
| attribute | explanation |
|
||||
+=====================+==================================================+
|
||||
| :const:`width` | width in bits used for hash values |
|
||||
+---------------------+--------------------------------------------------+
|
||||
| :const:`modulus` | prime modulus P used for numeric hash scheme |
|
||||
+---------------------+--------------------------------------------------+
|
||||
| :const:`inf` | hash value returned for a positive infinity |
|
||||
+---------------------+--------------------------------------------------+
|
||||
| :const:`nan` | hash value returned for a nan |
|
||||
+---------------------+--------------------------------------------------+
|
||||
| :const:`imag` | multiplier used for the imaginary part of a |
|
||||
| | complex number |
|
||||
+---------------------+--------------------------------------------------+
|
||||
|
||||
.. versionadded:: 3.2
|
||||
|
||||
|
||||
.. data:: hexversion
|
||||
|
||||
The version number encoded as a single integer. This is guaranteed to increase
|
||||
|
|
|
@ -126,6 +126,20 @@ Used in: PY_LONG_LONG
|
|||
#endif
|
||||
#endif
|
||||
|
||||
/* Parameters used for the numeric hash implementation. See notes for
|
||||
_PyHash_Double in Objects/object.c. Numeric hashes are based on
|
||||
reduction modulo the prime 2**_PyHASH_BITS - 1. */
|
||||
|
||||
#if SIZEOF_LONG >= 8
|
||||
#define _PyHASH_BITS 61
|
||||
#else
|
||||
#define _PyHASH_BITS 31
|
||||
#endif
|
||||
#define _PyHASH_MODULUS ((1UL << _PyHASH_BITS) - 1)
|
||||
#define _PyHASH_INF 314159
|
||||
#define _PyHASH_NAN 0
|
||||
#define _PyHASH_IMAG 1000003UL
|
||||
|
||||
/* uintptr_t is the C9X name for an unsigned integral type such that a
|
||||
* legitimate void* can be cast to uintptr_t and then back to void* again
|
||||
* without loss of information. Similarly for intptr_t, wrt a signed
|
||||
|
|
|
@ -862,7 +862,7 @@ class Decimal(object):
|
|||
# that specified by IEEE 754.
|
||||
|
||||
def __eq__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
if self._check_nans(other, context):
|
||||
|
@ -870,7 +870,7 @@ class Decimal(object):
|
|||
return self._cmp(other) == 0
|
||||
|
||||
def __ne__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
if self._check_nans(other, context):
|
||||
|
@ -879,7 +879,7 @@ class Decimal(object):
|
|||
|
||||
|
||||
def __lt__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
ans = self._compare_check_nans(other, context)
|
||||
|
@ -888,7 +888,7 @@ class Decimal(object):
|
|||
return self._cmp(other) < 0
|
||||
|
||||
def __le__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
ans = self._compare_check_nans(other, context)
|
||||
|
@ -897,7 +897,7 @@ class Decimal(object):
|
|||
return self._cmp(other) <= 0
|
||||
|
||||
def __gt__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
ans = self._compare_check_nans(other, context)
|
||||
|
@ -906,7 +906,7 @@ class Decimal(object):
|
|||
return self._cmp(other) > 0
|
||||
|
||||
def __ge__(self, other, context=None):
|
||||
other = _convert_other(other, allow_float=True)
|
||||
other = _convert_other(other, allow_float = True)
|
||||
if other is NotImplemented:
|
||||
return other
|
||||
ans = self._compare_check_nans(other, context)
|
||||
|
@ -935,55 +935,28 @@ class Decimal(object):
|
|||
|
||||
def __hash__(self):
|
||||
"""x.__hash__() <==> hash(x)"""
|
||||
# Decimal integers must hash the same as the ints
|
||||
#
|
||||
# The hash of a nonspecial noninteger Decimal must depend only
|
||||
# on the value of that Decimal, and not on its representation.
|
||||
# For example: hash(Decimal('100E-1')) == hash(Decimal('10')).
|
||||
|
||||
# Equality comparisons involving signaling nans can raise an
|
||||
# exception; since equality checks are implicitly and
|
||||
# unpredictably used when checking set and dict membership, we
|
||||
# prevent signaling nans from being used as set elements or
|
||||
# dict keys by making __hash__ raise an exception.
|
||||
# In order to make sure that the hash of a Decimal instance
|
||||
# agrees with the hash of a numerically equal integer, float
|
||||
# or Fraction, we follow the rules for numeric hashes outlined
|
||||
# in the documentation. (See library docs, 'Built-in Types').
|
||||
if self._is_special:
|
||||
if self.is_snan():
|
||||
raise TypeError('Cannot hash a signaling NaN value.')
|
||||
elif self.is_nan():
|
||||
# 0 to match hash(float('nan'))
|
||||
return 0
|
||||
return _PyHASH_NAN
|
||||
else:
|
||||
# values chosen to match hash(float('inf')) and
|
||||
# hash(float('-inf')).
|
||||
if self._sign:
|
||||
return -271828
|
||||
return -_PyHASH_INF
|
||||
else:
|
||||
return 314159
|
||||
return _PyHASH_INF
|
||||
|
||||
# In Python 2.7, we're allowing comparisons (but not
|
||||
# arithmetic operations) between floats and Decimals; so if
|
||||
# a Decimal instance is exactly representable as a float then
|
||||
# its hash should match that of the float.
|
||||
self_as_float = float(self)
|
||||
if Decimal.from_float(self_as_float) == self:
|
||||
return hash(self_as_float)
|
||||
|
||||
if self._isinteger():
|
||||
op = _WorkRep(self.to_integral_value())
|
||||
# to make computation feasible for Decimals with large
|
||||
# exponent, we use the fact that hash(n) == hash(m) for
|
||||
# any two nonzero integers n and m such that (i) n and m
|
||||
# have the same sign, and (ii) n is congruent to m modulo
|
||||
# 2**64-1. So we can replace hash((-1)**s*c*10**e) with
|
||||
# hash((-1)**s*c*pow(10, e, 2**64-1).
|
||||
return hash((-1)**op.sign*op.int*pow(10, op.exp, 2**64-1))
|
||||
# The value of a nonzero nonspecial Decimal instance is
|
||||
# faithfully represented by the triple consisting of its sign,
|
||||
# its adjusted exponent, and its coefficient with trailing
|
||||
# zeros removed.
|
||||
return hash((self._sign,
|
||||
self._exp+len(self._int),
|
||||
self._int.rstrip('0')))
|
||||
if self._exp >= 0:
|
||||
exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
|
||||
else:
|
||||
exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
|
||||
hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
|
||||
return hash_ if self >= 0 else -hash_
|
||||
|
||||
def as_tuple(self):
|
||||
"""Represents the number as a triple tuple.
|
||||
|
@ -6218,6 +6191,17 @@ _NegativeOne = Decimal(-1)
|
|||
# _SignedInfinity[sign] is infinity w/ that sign
|
||||
_SignedInfinity = (_Infinity, _NegativeInfinity)
|
||||
|
||||
# Constants related to the hash implementation; hash(x) is based
|
||||
# on the reduction of x modulo _PyHASH_MODULUS
|
||||
import sys
|
||||
_PyHASH_MODULUS = sys.hash_info.modulus
|
||||
# hash values to use for positive and negative infinities, and nans
|
||||
_PyHASH_INF = sys.hash_info.inf
|
||||
_PyHASH_NAN = sys.hash_info.nan
|
||||
del sys
|
||||
|
||||
# _PyHASH_10INV is the inverse of 10 modulo the prime _PyHASH_MODULUS
|
||||
_PyHASH_10INV = pow(10, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
|
|
@ -8,6 +8,7 @@ import math
|
|||
import numbers
|
||||
import operator
|
||||
import re
|
||||
import sys
|
||||
|
||||
__all__ = ['Fraction', 'gcd']
|
||||
|
||||
|
@ -23,6 +24,12 @@ def gcd(a, b):
|
|||
a, b = b, a%b
|
||||
return a
|
||||
|
||||
# Constants related to the hash implementation; hash(x) is based
|
||||
# on the reduction of x modulo the prime _PyHASH_MODULUS.
|
||||
_PyHASH_MODULUS = sys.hash_info.modulus
|
||||
# Value to be used for rationals that reduce to infinity modulo
|
||||
# _PyHASH_MODULUS.
|
||||
_PyHASH_INF = sys.hash_info.inf
|
||||
|
||||
_RATIONAL_FORMAT = re.compile(r"""
|
||||
\A\s* # optional whitespace at the start, then
|
||||
|
@ -528,16 +535,22 @@ class Fraction(numbers.Rational):
|
|||
|
||||
"""
|
||||
# XXX since this method is expensive, consider caching the result
|
||||
if self._denominator == 1:
|
||||
# Get integers right.
|
||||
return hash(self._numerator)
|
||||
# Expensive check, but definitely correct.
|
||||
if self == float(self):
|
||||
return hash(float(self))
|
||||
|
||||
# In order to make sure that the hash of a Fraction agrees
|
||||
# with the hash of a numerically equal integer, float or
|
||||
# Decimal instance, we follow the rules for numeric hashes
|
||||
# outlined in the documentation. (See library docs, 'Built-in
|
||||
# Types').
|
||||
|
||||
# dinv is the inverse of self._denominator modulo the prime
|
||||
# _PyHASH_MODULUS, or 0 if self._denominator is divisible by
|
||||
# _PyHASH_MODULUS.
|
||||
dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
|
||||
if not dinv:
|
||||
hash_ = _PyHASH_INF
|
||||
else:
|
||||
# Use tuple's hash to avoid a high collision rate on
|
||||
# simple fractions.
|
||||
return hash((self._numerator, self._denominator))
|
||||
hash_ = abs(self._numerator) * dinv % _PyHASH_MODULUS
|
||||
return hash_ if self >= 0 else -hash_
|
||||
|
||||
def __eq__(a, b):
|
||||
"""a == b"""
|
||||
|
|
|
@ -914,15 +914,6 @@ class InfNanTest(unittest.TestCase):
|
|||
self.assertFalse(NAN.is_inf())
|
||||
self.assertFalse((0.).is_inf())
|
||||
|
||||
def test_hash_inf(self):
|
||||
# the actual values here should be regarded as an
|
||||
# implementation detail, but they need to be
|
||||
# identical to those used in the Decimal module.
|
||||
self.assertEqual(hash(float('inf')), 314159)
|
||||
self.assertEqual(hash(float('-inf')), -271828)
|
||||
self.assertEqual(hash(float('nan')), 0)
|
||||
|
||||
|
||||
fromHex = float.fromhex
|
||||
toHex = float.hex
|
||||
class HexFloatTestCase(unittest.TestCase):
|
||||
|
|
|
@ -0,0 +1,151 @@
|
|||
# test interactions betwen int, float, Decimal and Fraction
|
||||
|
||||
import unittest
|
||||
import random
|
||||
import math
|
||||
import sys
|
||||
import operator
|
||||
from test.support import run_unittest
|
||||
|
||||
from decimal import Decimal as D
|
||||
from fractions import Fraction as F
|
||||
|
||||
# Constants related to the hash implementation; hash(x) is based
|
||||
# on the reduction of x modulo the prime _PyHASH_MODULUS.
|
||||
_PyHASH_MODULUS = sys.hash_info.modulus
|
||||
_PyHASH_INF = sys.hash_info.inf
|
||||
|
||||
class HashTest(unittest.TestCase):
|
||||
def check_equal_hash(self, x, y):
|
||||
# check both that x and y are equal and that their hashes are equal
|
||||
self.assertEqual(hash(x), hash(y),
|
||||
"got different hashes for {!r} and {!r}".format(x, y))
|
||||
self.assertEqual(x, y)
|
||||
|
||||
def test_bools(self):
|
||||
self.check_equal_hash(False, 0)
|
||||
self.check_equal_hash(True, 1)
|
||||
|
||||
def test_integers(self):
|
||||
# check that equal values hash equal
|
||||
|
||||
# exact integers
|
||||
for i in range(-1000, 1000):
|
||||
self.check_equal_hash(i, float(i))
|
||||
self.check_equal_hash(i, D(i))
|
||||
self.check_equal_hash(i, F(i))
|
||||
|
||||
# the current hash is based on reduction modulo 2**n-1 for some
|
||||
# n, so pay special attention to numbers of the form 2**n and 2**n-1.
|
||||
for i in range(100):
|
||||
n = 2**i - 1
|
||||
if n == int(float(n)):
|
||||
self.check_equal_hash(n, float(n))
|
||||
self.check_equal_hash(-n, -float(n))
|
||||
self.check_equal_hash(n, D(n))
|
||||
self.check_equal_hash(n, F(n))
|
||||
self.check_equal_hash(-n, D(-n))
|
||||
self.check_equal_hash(-n, F(-n))
|
||||
|
||||
n = 2**i
|
||||
self.check_equal_hash(n, float(n))
|
||||
self.check_equal_hash(-n, -float(n))
|
||||
self.check_equal_hash(n, D(n))
|
||||
self.check_equal_hash(n, F(n))
|
||||
self.check_equal_hash(-n, D(-n))
|
||||
self.check_equal_hash(-n, F(-n))
|
||||
|
||||
# random values of various sizes
|
||||
for _ in range(1000):
|
||||
e = random.randrange(300)
|
||||
n = random.randrange(-10**e, 10**e)
|
||||
self.check_equal_hash(n, D(n))
|
||||
self.check_equal_hash(n, F(n))
|
||||
if n == int(float(n)):
|
||||
self.check_equal_hash(n, float(n))
|
||||
|
||||
def test_binary_floats(self):
|
||||
# check that floats hash equal to corresponding Fractions and Decimals
|
||||
|
||||
# floats that are distinct but numerically equal should hash the same
|
||||
self.check_equal_hash(0.0, -0.0)
|
||||
|
||||
# zeros
|
||||
self.check_equal_hash(0.0, D(0))
|
||||
self.check_equal_hash(-0.0, D(0))
|
||||
self.check_equal_hash(-0.0, D('-0.0'))
|
||||
self.check_equal_hash(0.0, F(0))
|
||||
|
||||
# infinities and nans
|
||||
self.check_equal_hash(float('inf'), D('inf'))
|
||||
self.check_equal_hash(float('-inf'), D('-inf'))
|
||||
|
||||
for _ in range(1000):
|
||||
x = random.random() * math.exp(random.random()*200.0 - 100.0)
|
||||
self.check_equal_hash(x, D.from_float(x))
|
||||
self.check_equal_hash(x, F.from_float(x))
|
||||
|
||||
def test_complex(self):
|
||||
# complex numbers with zero imaginary part should hash equal to
|
||||
# the corresponding float
|
||||
|
||||
test_values = [0.0, -0.0, 1.0, -1.0, 0.40625, -5136.5,
|
||||
float('inf'), float('-inf')]
|
||||
|
||||
for zero in -0.0, 0.0:
|
||||
for value in test_values:
|
||||
self.check_equal_hash(value, complex(value, zero))
|
||||
|
||||
def test_decimals(self):
|
||||
# check that Decimal instances that have different representations
|
||||
# but equal values give the same hash
|
||||
zeros = ['0', '-0', '0.0', '-0.0e10', '000e-10']
|
||||
for zero in zeros:
|
||||
self.check_equal_hash(D(zero), D(0))
|
||||
|
||||
self.check_equal_hash(D('1.00'), D(1))
|
||||
self.check_equal_hash(D('1.00000'), D(1))
|
||||
self.check_equal_hash(D('-1.00'), D(-1))
|
||||
self.check_equal_hash(D('-1.00000'), D(-1))
|
||||
self.check_equal_hash(D('123e2'), D(12300))
|
||||
self.check_equal_hash(D('1230e1'), D(12300))
|
||||
self.check_equal_hash(D('12300'), D(12300))
|
||||
self.check_equal_hash(D('12300.0'), D(12300))
|
||||
self.check_equal_hash(D('12300.00'), D(12300))
|
||||
self.check_equal_hash(D('12300.000'), D(12300))
|
||||
|
||||
def test_fractions(self):
|
||||
# check special case for fractions where either the numerator
|
||||
# or the denominator is a multiple of _PyHASH_MODULUS
|
||||
self.assertEqual(hash(F(1, _PyHASH_MODULUS)), _PyHASH_INF)
|
||||
self.assertEqual(hash(F(-1, 3*_PyHASH_MODULUS)), -_PyHASH_INF)
|
||||
self.assertEqual(hash(F(7*_PyHASH_MODULUS, 1)), 0)
|
||||
self.assertEqual(hash(F(-_PyHASH_MODULUS, 1)), 0)
|
||||
|
||||
def test_hash_normalization(self):
|
||||
# Test for a bug encountered while changing long_hash.
|
||||
#
|
||||
# Given objects x and y, it should be possible for y's
|
||||
# __hash__ method to return hash(x) in order to ensure that
|
||||
# hash(x) == hash(y). But hash(x) is not exactly equal to the
|
||||
# result of x.__hash__(): there's some internal normalization
|
||||
# to make sure that the result fits in a C long, and is not
|
||||
# equal to the invalid hash value -1. This internal
|
||||
# normalization must therefore not change the result of
|
||||
# hash(x) for any x.
|
||||
|
||||
class HalibutProxy:
|
||||
def __hash__(self):
|
||||
return hash('halibut')
|
||||
def __eq__(self, other):
|
||||
return other == 'halibut'
|
||||
|
||||
x = {'halibut', HalibutProxy()}
|
||||
self.assertEqual(len(x), 1)
|
||||
|
||||
|
||||
def test_main():
|
||||
run_unittest(HashTest)
|
||||
|
||||
if __name__ == '__main__':
|
||||
test_main()
|
|
@ -426,6 +426,23 @@ class SysModuleTest(unittest.TestCase):
|
|||
self.assertEqual(type(sys.int_info.bits_per_digit), int)
|
||||
self.assertEqual(type(sys.int_info.sizeof_digit), int)
|
||||
self.assertIsInstance(sys.hexversion, int)
|
||||
|
||||
self.assertEqual(len(sys.hash_info), 5)
|
||||
self.assertLess(sys.hash_info.modulus, 2**sys.hash_info.width)
|
||||
# sys.hash_info.modulus should be a prime; we do a quick
|
||||
# probable primality test (doesn't exclude the possibility of
|
||||
# a Carmichael number)
|
||||
for x in range(1, 100):
|
||||
self.assertEqual(
|
||||
pow(x, sys.hash_info.modulus-1, sys.hash_info.modulus),
|
||||
1,
|
||||
"sys.hash_info.modulus {} is a non-prime".format(
|
||||
sys.hash_info.modulus)
|
||||
)
|
||||
self.assertIsInstance(sys.hash_info.inf, int)
|
||||
self.assertIsInstance(sys.hash_info.nan, int)
|
||||
self.assertIsInstance(sys.hash_info.imag, int)
|
||||
|
||||
self.assertIsInstance(sys.maxsize, int)
|
||||
self.assertIsInstance(sys.maxunicode, int)
|
||||
self.assertIsInstance(sys.platform, str)
|
||||
|
|
|
@ -12,6 +12,11 @@ What's New in Python 3.2 Alpha 1?
|
|||
Core and Builtins
|
||||
-----------------
|
||||
|
||||
- Issue #8188: Introduce a new scheme for computing hashes of numbers
|
||||
(instances of int, float, complex, decimal.Decimal and
|
||||
fractions.Fraction) that makes it easy to maintain the invariant
|
||||
that hash(x) == hash(y) whenever x and y have equal value.
|
||||
|
||||
- Issue #8748: Fix two issues with comparisons between complex and integer
|
||||
objects. (1) The comparison could incorrectly return True in some cases
|
||||
(2**53+1 == complex(2**53) == 2**53), breaking transivity of equality.
|
||||
|
|
|
@ -403,12 +403,12 @@ complex_str(PyComplexObject *v)
|
|||
static long
|
||||
complex_hash(PyComplexObject *v)
|
||||
{
|
||||
long hashreal, hashimag, combined;
|
||||
hashreal = _Py_HashDouble(v->cval.real);
|
||||
if (hashreal == -1)
|
||||
unsigned long hashreal, hashimag, combined;
|
||||
hashreal = (unsigned long)_Py_HashDouble(v->cval.real);
|
||||
if (hashreal == (unsigned long)-1)
|
||||
return -1;
|
||||
hashimag = _Py_HashDouble(v->cval.imag);
|
||||
if (hashimag == -1)
|
||||
hashimag = (unsigned long)_Py_HashDouble(v->cval.imag);
|
||||
if (hashimag == (unsigned long)-1)
|
||||
return -1;
|
||||
/* Note: if the imaginary part is 0, hashimag is 0 now,
|
||||
* so the following returns hashreal unchanged. This is
|
||||
|
@ -416,10 +416,10 @@ complex_hash(PyComplexObject *v)
|
|||
* compare equal must have the same hash value, so that
|
||||
* hash(x + 0*j) must equal hash(x).
|
||||
*/
|
||||
combined = hashreal + 1000003 * hashimag;
|
||||
if (combined == -1)
|
||||
combined = -2;
|
||||
return combined;
|
||||
combined = hashreal + _PyHASH_IMAG * hashimag;
|
||||
if (combined == (unsigned long)-1)
|
||||
combined = (unsigned long)-2;
|
||||
return (long)combined;
|
||||
}
|
||||
|
||||
/* This macro may return! */
|
||||
|
|
|
@ -2571,18 +2571,37 @@ long_hash(PyLongObject *v)
|
|||
sign = -1;
|
||||
i = -(i);
|
||||
}
|
||||
/* The following loop produces a C unsigned long x such that x is
|
||||
congruent to the absolute value of v modulo ULONG_MAX. The
|
||||
resulting x is nonzero if and only if v is. */
|
||||
while (--i >= 0) {
|
||||
/* Force a native long #-bits (32 or 64) circular shift */
|
||||
x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT);
|
||||
/* Here x is a quantity in the range [0, _PyHASH_MODULUS); we
|
||||
want to compute x * 2**PyLong_SHIFT + v->ob_digit[i] modulo
|
||||
_PyHASH_MODULUS.
|
||||
|
||||
The computation of x * 2**PyLong_SHIFT % _PyHASH_MODULUS
|
||||
amounts to a rotation of the bits of x. To see this, write
|
||||
|
||||
x * 2**PyLong_SHIFT = y * 2**_PyHASH_BITS + z
|
||||
|
||||
where y = x >> (_PyHASH_BITS - PyLong_SHIFT) gives the top
|
||||
PyLong_SHIFT bits of x (those that are shifted out of the
|
||||
original _PyHASH_BITS bits, and z = (x << PyLong_SHIFT) &
|
||||
_PyHASH_MODULUS gives the bottom _PyHASH_BITS - PyLong_SHIFT
|
||||
bits of x, shifted up. Then since 2**_PyHASH_BITS is
|
||||
congruent to 1 modulo _PyHASH_MODULUS, y*2**_PyHASH_BITS is
|
||||
congruent to y modulo _PyHASH_MODULUS. So
|
||||
|
||||
x * 2**PyLong_SHIFT = y + z (mod _PyHASH_MODULUS).
|
||||
|
||||
The right-hand side is just the result of rotating the
|
||||
_PyHASH_BITS bits of x left by PyLong_SHIFT places; since
|
||||
not all _PyHASH_BITS bits of x are 1s, the same is true
|
||||
after rotation, so 0 <= y+z < _PyHASH_MODULUS and y + z is
|
||||
the reduction of x*2**PyLong_SHIFT modulo
|
||||
_PyHASH_MODULUS. */
|
||||
x = ((x << PyLong_SHIFT) & _PyHASH_MODULUS) |
|
||||
(x >> (_PyHASH_BITS - PyLong_SHIFT));
|
||||
x += v->ob_digit[i];
|
||||
/* If the addition above overflowed we compensate by
|
||||
incrementing. This preserves the value modulo
|
||||
ULONG_MAX. */
|
||||
if (x < v->ob_digit[i])
|
||||
x++;
|
||||
if (x >= _PyHASH_MODULUS)
|
||||
x -= _PyHASH_MODULUS;
|
||||
}
|
||||
x = x * sign;
|
||||
if (x == (unsigned long)-1)
|
||||
|
|
134
Objects/object.c
134
Objects/object.c
|
@ -647,63 +647,101 @@ PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
|
|||
All the utility functions (_Py_Hash*()) return "-1" to signify an error.
|
||||
*/
|
||||
|
||||
/* For numeric types, the hash of a number x is based on the reduction
|
||||
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
|
||||
hash(x) == hash(y) whenever x and y are numerically equal, even if
|
||||
x and y have different types.
|
||||
|
||||
A quick summary of the hashing strategy:
|
||||
|
||||
(1) First define the 'reduction of x modulo P' for any rational
|
||||
number x; this is a standard extension of the usual notion of
|
||||
reduction modulo P for integers. If x == p/q (written in lowest
|
||||
terms), the reduction is interpreted as the reduction of p times
|
||||
the inverse of the reduction of q, all modulo P; if q is exactly
|
||||
divisible by P then define the reduction to be infinity. So we've
|
||||
got a well-defined map
|
||||
|
||||
reduce : { rational numbers } -> { 0, 1, 2, ..., P-1, infinity }.
|
||||
|
||||
(2) Now for a rational number x, define hash(x) by:
|
||||
|
||||
reduce(x) if x >= 0
|
||||
-reduce(-x) if x < 0
|
||||
|
||||
If the result of the reduction is infinity (this is impossible for
|
||||
integers, floats and Decimals) then use the predefined hash value
|
||||
_PyHASH_INF for x >= 0, or -_PyHASH_INF for x < 0, instead.
|
||||
_PyHASH_INF, -_PyHASH_INF and _PyHASH_NAN are also used for the
|
||||
hashes of float and Decimal infinities and nans.
|
||||
|
||||
A selling point for the above strategy is that it makes it possible
|
||||
to compute hashes of decimal and binary floating-point numbers
|
||||
efficiently, even if the exponent of the binary or decimal number
|
||||
is large. The key point is that
|
||||
|
||||
reduce(x * y) == reduce(x) * reduce(y) (modulo _PyHASH_MODULUS)
|
||||
|
||||
provided that {reduce(x), reduce(y)} != {0, infinity}. The reduction of a
|
||||
binary or decimal float is never infinity, since the denominator is a power
|
||||
of 2 (for binary) or a divisor of a power of 10 (for decimal). So we have,
|
||||
for nonnegative x,
|
||||
|
||||
reduce(x * 2**e) == reduce(x) * reduce(2**e) % _PyHASH_MODULUS
|
||||
|
||||
reduce(x * 10**e) == reduce(x) * reduce(10**e) % _PyHASH_MODULUS
|
||||
|
||||
and reduce(10**e) can be computed efficiently by the usual modular
|
||||
exponentiation algorithm. For reduce(2**e) it's even better: since
|
||||
P is of the form 2**n-1, reduce(2**e) is 2**(e mod n), and multiplication
|
||||
by 2**(e mod n) modulo 2**n-1 just amounts to a rotation of bits.
|
||||
|
||||
*/
|
||||
|
||||
long
|
||||
_Py_HashDouble(double v)
|
||||
{
|
||||
double intpart, fractpart;
|
||||
int expo;
|
||||
long hipart;
|
||||
long x; /* the final hash value */
|
||||
/* This is designed so that Python numbers of different types
|
||||
* that compare equal hash to the same value; otherwise comparisons
|
||||
* of mapping keys will turn out weird.
|
||||
*/
|
||||
int e, sign;
|
||||
double m;
|
||||
unsigned long x, y;
|
||||
|
||||
if (!Py_IS_FINITE(v)) {
|
||||
if (Py_IS_INFINITY(v))
|
||||
return v < 0 ? -271828 : 314159;
|
||||
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
|
||||
else
|
||||
return 0;
|
||||
return _PyHASH_NAN;
|
||||
}
|
||||
fractpart = modf(v, &intpart);
|
||||
if (fractpart == 0.0) {
|
||||
/* This must return the same hash as an equal int or long. */
|
||||
if (intpart > LONG_MAX/2 || -intpart > LONG_MAX/2) {
|
||||
/* Convert to long and use its hash. */
|
||||
PyObject *plong; /* converted to Python long */
|
||||
plong = PyLong_FromDouble(v);
|
||||
if (plong == NULL)
|
||||
return -1;
|
||||
x = PyObject_Hash(plong);
|
||||
Py_DECREF(plong);
|
||||
return x;
|
||||
}
|
||||
/* Fits in a C long == a Python int, so is its own hash. */
|
||||
x = (long)intpart;
|
||||
if (x == -1)
|
||||
x = -2;
|
||||
return x;
|
||||
|
||||
m = frexp(v, &e);
|
||||
|
||||
sign = 1;
|
||||
if (m < 0) {
|
||||
sign = -1;
|
||||
m = -m;
|
||||
}
|
||||
/* The fractional part is non-zero, so we don't have to worry about
|
||||
* making this match the hash of some other type.
|
||||
* Use frexp to get at the bits in the double.
|
||||
* Since the VAX D double format has 56 mantissa bits, which is the
|
||||
* most of any double format in use, each of these parts may have as
|
||||
* many as (but no more than) 56 significant bits.
|
||||
* So, assuming sizeof(long) >= 4, each part can be broken into two
|
||||
* longs; frexp and multiplication are used to do that.
|
||||
* Also, since the Cray double format has 15 exponent bits, which is
|
||||
* the most of any double format in use, shifting the exponent field
|
||||
* left by 15 won't overflow a long (again assuming sizeof(long) >= 4).
|
||||
*/
|
||||
v = frexp(v, &expo);
|
||||
v *= 2147483648.0; /* 2**31 */
|
||||
hipart = (long)v; /* take the top 32 bits */
|
||||
v = (v - (double)hipart) * 2147483648.0; /* get the next 32 bits */
|
||||
x = hipart + (long)v + (expo << 15);
|
||||
if (x == -1)
|
||||
x = -2;
|
||||
return x;
|
||||
|
||||
/* process 28 bits at a time; this should work well both for binary
|
||||
and hexadecimal floating point. */
|
||||
x = 0;
|
||||
while (m) {
|
||||
x = ((x << 28) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - 28);
|
||||
m *= 268435456.0; /* 2**28 */
|
||||
e -= 28;
|
||||
y = (unsigned long)m; /* pull out integer part */
|
||||
m -= y;
|
||||
x += y;
|
||||
if (x >= _PyHASH_MODULUS)
|
||||
x -= _PyHASH_MODULUS;
|
||||
}
|
||||
|
||||
/* adjust for the exponent; first reduce it modulo _PyHASH_BITS */
|
||||
e = e >= 0 ? e % _PyHASH_BITS : _PyHASH_BITS-1-((-1-e) % _PyHASH_BITS);
|
||||
x = ((x << e) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - e);
|
||||
|
||||
x = x * sign;
|
||||
if (x == (unsigned long)-1)
|
||||
x = (unsigned long)-2;
|
||||
return (long)x;
|
||||
}
|
||||
|
||||
long
|
||||
|
|
|
@ -4921,6 +4921,7 @@ slot_tp_hash(PyObject *self)
|
|||
PyObject *func, *res;
|
||||
static PyObject *hash_str;
|
||||
long h;
|
||||
int overflow;
|
||||
|
||||
func = lookup_method(self, "__hash__", &hash_str);
|
||||
|
||||
|
@ -4937,14 +4938,27 @@ slot_tp_hash(PyObject *self)
|
|||
Py_DECREF(func);
|
||||
if (res == NULL)
|
||||
return -1;
|
||||
if (PyLong_Check(res))
|
||||
|
||||
if (!PyLong_Check(res)) {
|
||||
PyErr_SetString(PyExc_TypeError,
|
||||
"__hash__ method should return an integer");
|
||||
return -1;
|
||||
}
|
||||
/* Transform the PyLong `res` to a C long `h`. For an existing
|
||||
hashable Python object x, hash(x) will always lie within the range
|
||||
of a C long. Therefore our transformation must preserve values
|
||||
that already lie within this range, to ensure that if x.__hash__()
|
||||
returns hash(y) then hash(x) == hash(y). */
|
||||
h = PyLong_AsLongAndOverflow(res, &overflow);
|
||||
if (overflow)
|
||||
/* res was not within the range of a C long, so we're free to
|
||||
use any sufficiently bit-mixing transformation;
|
||||
long.__hash__ will do nicely. */
|
||||
h = PyLong_Type.tp_hash(res);
|
||||
else
|
||||
h = PyLong_AsLong(res);
|
||||
Py_DECREF(res);
|
||||
if (h == -1 && !PyErr_Occurred())
|
||||
h = -2;
|
||||
return h;
|
||||
if (h == -1 && !PyErr_Occurred())
|
||||
h = -2;
|
||||
return h;
|
||||
}
|
||||
|
||||
static PyObject *
|
||||
|
|
|
@ -570,6 +570,57 @@ sys_setrecursionlimit(PyObject *self, PyObject *args)
|
|||
return Py_None;
|
||||
}
|
||||
|
||||
static PyTypeObject Hash_InfoType;
|
||||
|
||||
PyDoc_STRVAR(hash_info_doc,
|
||||
"hash_info\n\
|
||||
\n\
|
||||
A struct sequence providing parameters used for computing\n\
|
||||
numeric hashes. The attributes are read only.");
|
||||
|
||||
static PyStructSequence_Field hash_info_fields[] = {
|
||||
{"width", "width of the type used for hashing, in bits"},
|
||||
{"modulus", "prime number giving the modulus on which the hash "
|
||||
"function is based"},
|
||||
{"inf", "value to be used for hash of a positive infinity"},
|
||||
{"nan", "value to be used for hash of a nan"},
|
||||
{"imag", "multiplier used for the imaginary part of a complex number"},
|
||||
{NULL, NULL}
|
||||
};
|
||||
|
||||
static PyStructSequence_Desc hash_info_desc = {
|
||||
"sys.hash_info",
|
||||
hash_info_doc,
|
||||
hash_info_fields,
|
||||
5,
|
||||
};
|
||||
|
||||
PyObject *
|
||||
get_hash_info(void)
|
||||
{
|
||||
PyObject *hash_info;
|
||||
int field = 0;
|
||||
hash_info = PyStructSequence_New(&Hash_InfoType);
|
||||
if (hash_info == NULL)
|
||||
return NULL;
|
||||
PyStructSequence_SET_ITEM(hash_info, field++,
|
||||
PyLong_FromLong(8*sizeof(long)));
|
||||
PyStructSequence_SET_ITEM(hash_info, field++,
|
||||
PyLong_FromLong(_PyHASH_MODULUS));
|
||||
PyStructSequence_SET_ITEM(hash_info, field++,
|
||||
PyLong_FromLong(_PyHASH_INF));
|
||||
PyStructSequence_SET_ITEM(hash_info, field++,
|
||||
PyLong_FromLong(_PyHASH_NAN));
|
||||
PyStructSequence_SET_ITEM(hash_info, field++,
|
||||
PyLong_FromLong(_PyHASH_IMAG));
|
||||
if (PyErr_Occurred()) {
|
||||
Py_CLEAR(hash_info);
|
||||
return NULL;
|
||||
}
|
||||
return hash_info;
|
||||
}
|
||||
|
||||
|
||||
PyDoc_STRVAR(setrecursionlimit_doc,
|
||||
"setrecursionlimit(n)\n\
|
||||
\n\
|
||||
|
@ -1482,6 +1533,11 @@ _PySys_Init(void)
|
|||
PyFloat_GetInfo());
|
||||
SET_SYS_FROM_STRING("int_info",
|
||||
PyLong_GetInfo());
|
||||
/* initialize hash_info */
|
||||
if (Hash_InfoType.tp_name == 0)
|
||||
PyStructSequence_InitType(&Hash_InfoType, &hash_info_desc);
|
||||
SET_SYS_FROM_STRING("hash_info",
|
||||
get_hash_info());
|
||||
SET_SYS_FROM_STRING("maxunicode",
|
||||
PyLong_FromLong(PyUnicode_GetMax()));
|
||||
SET_SYS_FROM_STRING("builtin_module_names",
|
||||
|
|
Loading…
Reference in New Issue