Issue #8188: Introduce a new scheme for computing hashes of numbers

(instances of int, float, complex, decimal.Decimal and
fractions.Fraction) that makes it easy to maintain the invariant that
hash(x) == hash(y) whenever x and y have equal value.
This commit is contained in:
Mark Dickinson 2010-05-23 13:33:13 +00:00
parent 03721133a6
commit dc787d2055
14 changed files with 566 additions and 137 deletions

View File

@ -595,6 +595,109 @@ hexadecimal string representing the same number::
'0x1.d380000000000p+11' '0x1.d380000000000p+11'
.. _numeric-hash:
Hashing of numeric types
------------------------
For numbers ``x`` and ``y``, possibly of different types, it's a requirement
that ``hash(x) == hash(y)`` whenever ``x == y`` (see the :meth:`__hash__`
method documentation for more details). For ease of implementation and
efficiency across a variety of numeric types (including :class:`int`,
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`)
Python's hash for numeric types is based on a single mathematical function
that's defined for any rational number, and hence applies to all instances of
:class:`int` and :class:`fraction.Fraction`, and all finite instances of
:class:`float` and :class:`decimal.Decimal`. Essentially, this function is
given by reduction modulo ``P`` for a fixed prime ``P``. The value of ``P`` is
made available to Python as the :attr:`modulus` attribute of
:data:`sys.hash_info`.
.. impl-detail::
Currently, the prime used is ``P = 2**31 - 1`` on machines with 32-bit C
longs and ``P = 2**61 - 1`` on machines with 64-bit C longs.
Here are the rules in detail:
- If ``x = m / n`` is a nonnegative rational number and ``n`` is not divisible
by ``P``, define ``hash(x)`` as ``m * invmod(n, P) % P``, where ``invmod(n,
P)`` gives the inverse of ``n`` modulo ``P``.
- If ``x = m / n`` is a nonnegative rational number and ``n`` is
divisible by ``P`` (but ``m`` is not) then ``n`` has no inverse
modulo ``P`` and the rule above doesn't apply; in this case define
``hash(x)`` to be the constant value ``sys.hash_info.inf``.
- If ``x = m / n`` is a negative rational number define ``hash(x)``
as ``-hash(-x)``. If the resulting hash is ``-1``, replace it with
``-2``.
- The particular values ``sys.hash_info.inf``, ``-sys.hash_info.inf``
and ``sys.hash_info.nan`` are used as hash values for positive
infinity, negative infinity, or nans (respectively). (All hashable
nans have the same hash value.)
- For a :class:`complex` number ``z``, the hash values of the real
and imaginary parts are combined by computing ``hash(z.real) +
sys.hash_info.imag * hash(z.imag)``, reduced modulo
``2**sys.hash_info.width`` so that it lies in
``range(-2**(sys.hash_info.width - 1), 2**(sys.hash_info.width -
1))``. Again, if the result is ``-1``, it's replaced with ``-2``.
To clarify the above rules, here's some example Python code,
equivalent to the builtin hash, for computing the hash of a rational
number, :class:`float`, or :class:`complex`::
import sys, math
def hash_fraction(m, n):
"""Compute the hash of a rational number m / n.
Assumes m and n are integers, with n positive.
Equivalent to hash(fractions.Fraction(m, n)).
"""
P = sys.hash_info.modulus
# Remove common factors of P. (Unnecessary if m and n already coprime.)
while m % P == n % P == 0:
m, n = m // P, n // P
if n % P == 0:
hash_ = sys.hash_info.inf
else:
# Fermat's Little Theorem: pow(n, P-1, P) is 1, so
# pow(n, P-2, P) gives the inverse of n modulo P.
hash_ = (abs(m) % P) * pow(n, P - 2, P) % P
if m < 0:
hash_ = -hash_
if hash_ == -1:
hash_ = -2
return hash_
def hash_float(x):
"""Compute the hash of a float x."""
if math.isnan(x):
return sys.hash_info.nan
elif math.isinf(x):
return sys.hash_info.inf if x > 0 else -sys.hash_info.inf
else:
return hash_fraction(*x.as_integer_ratio())
def hash_complex(z):
"""Compute the hash of a complex number z."""
hash_ = hash_float(z.real) + sys.hash_info.imag * hash_float(z.imag)
# do a signed reduction modulo 2**sys.hash_info.width
M = 2**(sys.hash_info.width - 1)
hash_ = (hash_ & (M - 1)) - (hash & M)
if hash_ == -1:
hash_ == -2
return hash_
.. _typeiter: .. _typeiter:
Iterator Types Iterator Types

View File

@ -446,6 +446,30 @@ always available.
Changed to a named tuple and added *service_pack_minor*, Changed to a named tuple and added *service_pack_minor*,
*service_pack_major*, *suite_mask*, and *product_type*. *service_pack_major*, *suite_mask*, and *product_type*.
.. data:: hash_info
A structseq giving parameters of the numeric hash implementation. For
more details about hashing of numeric types, see :ref:`numeric-hash`.
+---------------------+--------------------------------------------------+
| attribute | explanation |
+=====================+==================================================+
| :const:`width` | width in bits used for hash values |
+---------------------+--------------------------------------------------+
| :const:`modulus` | prime modulus P used for numeric hash scheme |
+---------------------+--------------------------------------------------+
| :const:`inf` | hash value returned for a positive infinity |
+---------------------+--------------------------------------------------+
| :const:`nan` | hash value returned for a nan |
+---------------------+--------------------------------------------------+
| :const:`imag` | multiplier used for the imaginary part of a |
| | complex number |
+---------------------+--------------------------------------------------+
.. versionadded:: 3.2
.. data:: hexversion .. data:: hexversion
The version number encoded as a single integer. This is guaranteed to increase The version number encoded as a single integer. This is guaranteed to increase

View File

@ -126,6 +126,20 @@ Used in: PY_LONG_LONG
#endif #endif
#endif #endif
/* Parameters used for the numeric hash implementation. See notes for
_PyHash_Double in Objects/object.c. Numeric hashes are based on
reduction modulo the prime 2**_PyHASH_BITS - 1. */
#if SIZEOF_LONG >= 8
#define _PyHASH_BITS 61
#else
#define _PyHASH_BITS 31
#endif
#define _PyHASH_MODULUS ((1UL << _PyHASH_BITS) - 1)
#define _PyHASH_INF 314159
#define _PyHASH_NAN 0
#define _PyHASH_IMAG 1000003UL
/* uintptr_t is the C9X name for an unsigned integral type such that a /* uintptr_t is the C9X name for an unsigned integral type such that a
* legitimate void* can be cast to uintptr_t and then back to void* again * legitimate void* can be cast to uintptr_t and then back to void* again
* without loss of information. Similarly for intptr_t, wrt a signed * without loss of information. Similarly for intptr_t, wrt a signed

View File

@ -862,7 +862,7 @@ class Decimal(object):
# that specified by IEEE 754. # that specified by IEEE 754.
def __eq__(self, other, context=None): def __eq__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
if self._check_nans(other, context): if self._check_nans(other, context):
@ -870,7 +870,7 @@ class Decimal(object):
return self._cmp(other) == 0 return self._cmp(other) == 0
def __ne__(self, other, context=None): def __ne__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
if self._check_nans(other, context): if self._check_nans(other, context):
@ -879,7 +879,7 @@ class Decimal(object):
def __lt__(self, other, context=None): def __lt__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
ans = self._compare_check_nans(other, context) ans = self._compare_check_nans(other, context)
@ -888,7 +888,7 @@ class Decimal(object):
return self._cmp(other) < 0 return self._cmp(other) < 0
def __le__(self, other, context=None): def __le__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
ans = self._compare_check_nans(other, context) ans = self._compare_check_nans(other, context)
@ -897,7 +897,7 @@ class Decimal(object):
return self._cmp(other) <= 0 return self._cmp(other) <= 0
def __gt__(self, other, context=None): def __gt__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
ans = self._compare_check_nans(other, context) ans = self._compare_check_nans(other, context)
@ -906,7 +906,7 @@ class Decimal(object):
return self._cmp(other) > 0 return self._cmp(other) > 0
def __ge__(self, other, context=None): def __ge__(self, other, context=None):
other = _convert_other(other, allow_float=True) other = _convert_other(other, allow_float = True)
if other is NotImplemented: if other is NotImplemented:
return other return other
ans = self._compare_check_nans(other, context) ans = self._compare_check_nans(other, context)
@ -935,55 +935,28 @@ class Decimal(object):
def __hash__(self): def __hash__(self):
"""x.__hash__() <==> hash(x)""" """x.__hash__() <==> hash(x)"""
# Decimal integers must hash the same as the ints
#
# The hash of a nonspecial noninteger Decimal must depend only
# on the value of that Decimal, and not on its representation.
# For example: hash(Decimal('100E-1')) == hash(Decimal('10')).
# Equality comparisons involving signaling nans can raise an # In order to make sure that the hash of a Decimal instance
# exception; since equality checks are implicitly and # agrees with the hash of a numerically equal integer, float
# unpredictably used when checking set and dict membership, we # or Fraction, we follow the rules for numeric hashes outlined
# prevent signaling nans from being used as set elements or # in the documentation. (See library docs, 'Built-in Types').
# dict keys by making __hash__ raise an exception.
if self._is_special: if self._is_special:
if self.is_snan(): if self.is_snan():
raise TypeError('Cannot hash a signaling NaN value.') raise TypeError('Cannot hash a signaling NaN value.')
elif self.is_nan(): elif self.is_nan():
# 0 to match hash(float('nan')) return _PyHASH_NAN
return 0
else: else:
# values chosen to match hash(float('inf')) and
# hash(float('-inf')).
if self._sign: if self._sign:
return -271828 return -_PyHASH_INF
else: else:
return 314159 return _PyHASH_INF
# In Python 2.7, we're allowing comparisons (but not if self._exp >= 0:
# arithmetic operations) between floats and Decimals; so if exp_hash = pow(10, self._exp, _PyHASH_MODULUS)
# a Decimal instance is exactly representable as a float then else:
# its hash should match that of the float. exp_hash = pow(_PyHASH_10INV, -self._exp, _PyHASH_MODULUS)
self_as_float = float(self) hash_ = int(self._int) * exp_hash % _PyHASH_MODULUS
if Decimal.from_float(self_as_float) == self: return hash_ if self >= 0 else -hash_
return hash(self_as_float)
if self._isinteger():
op = _WorkRep(self.to_integral_value())
# to make computation feasible for Decimals with large
# exponent, we use the fact that hash(n) == hash(m) for
# any two nonzero integers n and m such that (i) n and m
# have the same sign, and (ii) n is congruent to m modulo
# 2**64-1. So we can replace hash((-1)**s*c*10**e) with
# hash((-1)**s*c*pow(10, e, 2**64-1).
return hash((-1)**op.sign*op.int*pow(10, op.exp, 2**64-1))
# The value of a nonzero nonspecial Decimal instance is
# faithfully represented by the triple consisting of its sign,
# its adjusted exponent, and its coefficient with trailing
# zeros removed.
return hash((self._sign,
self._exp+len(self._int),
self._int.rstrip('0')))
def as_tuple(self): def as_tuple(self):
"""Represents the number as a triple tuple. """Represents the number as a triple tuple.
@ -6218,6 +6191,17 @@ _NegativeOne = Decimal(-1)
# _SignedInfinity[sign] is infinity w/ that sign # _SignedInfinity[sign] is infinity w/ that sign
_SignedInfinity = (_Infinity, _NegativeInfinity) _SignedInfinity = (_Infinity, _NegativeInfinity)
# Constants related to the hash implementation; hash(x) is based
# on the reduction of x modulo _PyHASH_MODULUS
import sys
_PyHASH_MODULUS = sys.hash_info.modulus
# hash values to use for positive and negative infinities, and nans
_PyHASH_INF = sys.hash_info.inf
_PyHASH_NAN = sys.hash_info.nan
del sys
# _PyHASH_10INV is the inverse of 10 modulo the prime _PyHASH_MODULUS
_PyHASH_10INV = pow(10, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
if __name__ == '__main__': if __name__ == '__main__':

View File

@ -8,6 +8,7 @@ import math
import numbers import numbers
import operator import operator
import re import re
import sys
__all__ = ['Fraction', 'gcd'] __all__ = ['Fraction', 'gcd']
@ -23,6 +24,12 @@ def gcd(a, b):
a, b = b, a%b a, b = b, a%b
return a return a
# Constants related to the hash implementation; hash(x) is based
# on the reduction of x modulo the prime _PyHASH_MODULUS.
_PyHASH_MODULUS = sys.hash_info.modulus
# Value to be used for rationals that reduce to infinity modulo
# _PyHASH_MODULUS.
_PyHASH_INF = sys.hash_info.inf
_RATIONAL_FORMAT = re.compile(r""" _RATIONAL_FORMAT = re.compile(r"""
\A\s* # optional whitespace at the start, then \A\s* # optional whitespace at the start, then
@ -528,16 +535,22 @@ class Fraction(numbers.Rational):
""" """
# XXX since this method is expensive, consider caching the result # XXX since this method is expensive, consider caching the result
if self._denominator == 1:
# Get integers right. # In order to make sure that the hash of a Fraction agrees
return hash(self._numerator) # with the hash of a numerically equal integer, float or
# Expensive check, but definitely correct. # Decimal instance, we follow the rules for numeric hashes
if self == float(self): # outlined in the documentation. (See library docs, 'Built-in
return hash(float(self)) # Types').
# dinv is the inverse of self._denominator modulo the prime
# _PyHASH_MODULUS, or 0 if self._denominator is divisible by
# _PyHASH_MODULUS.
dinv = pow(self._denominator, _PyHASH_MODULUS - 2, _PyHASH_MODULUS)
if not dinv:
hash_ = _PyHASH_INF
else: else:
# Use tuple's hash to avoid a high collision rate on hash_ = abs(self._numerator) * dinv % _PyHASH_MODULUS
# simple fractions. return hash_ if self >= 0 else -hash_
return hash((self._numerator, self._denominator))
def __eq__(a, b): def __eq__(a, b):
"""a == b""" """a == b"""

View File

@ -914,15 +914,6 @@ class InfNanTest(unittest.TestCase):
self.assertFalse(NAN.is_inf()) self.assertFalse(NAN.is_inf())
self.assertFalse((0.).is_inf()) self.assertFalse((0.).is_inf())
def test_hash_inf(self):
# the actual values here should be regarded as an
# implementation detail, but they need to be
# identical to those used in the Decimal module.
self.assertEqual(hash(float('inf')), 314159)
self.assertEqual(hash(float('-inf')), -271828)
self.assertEqual(hash(float('nan')), 0)
fromHex = float.fromhex fromHex = float.fromhex
toHex = float.hex toHex = float.hex
class HexFloatTestCase(unittest.TestCase): class HexFloatTestCase(unittest.TestCase):

View File

@ -0,0 +1,151 @@
# test interactions betwen int, float, Decimal and Fraction
import unittest
import random
import math
import sys
import operator
from test.support import run_unittest
from decimal import Decimal as D
from fractions import Fraction as F
# Constants related to the hash implementation; hash(x) is based
# on the reduction of x modulo the prime _PyHASH_MODULUS.
_PyHASH_MODULUS = sys.hash_info.modulus
_PyHASH_INF = sys.hash_info.inf
class HashTest(unittest.TestCase):
def check_equal_hash(self, x, y):
# check both that x and y are equal and that their hashes are equal
self.assertEqual(hash(x), hash(y),
"got different hashes for {!r} and {!r}".format(x, y))
self.assertEqual(x, y)
def test_bools(self):
self.check_equal_hash(False, 0)
self.check_equal_hash(True, 1)
def test_integers(self):
# check that equal values hash equal
# exact integers
for i in range(-1000, 1000):
self.check_equal_hash(i, float(i))
self.check_equal_hash(i, D(i))
self.check_equal_hash(i, F(i))
# the current hash is based on reduction modulo 2**n-1 for some
# n, so pay special attention to numbers of the form 2**n and 2**n-1.
for i in range(100):
n = 2**i - 1
if n == int(float(n)):
self.check_equal_hash(n, float(n))
self.check_equal_hash(-n, -float(n))
self.check_equal_hash(n, D(n))
self.check_equal_hash(n, F(n))
self.check_equal_hash(-n, D(-n))
self.check_equal_hash(-n, F(-n))
n = 2**i
self.check_equal_hash(n, float(n))
self.check_equal_hash(-n, -float(n))
self.check_equal_hash(n, D(n))
self.check_equal_hash(n, F(n))
self.check_equal_hash(-n, D(-n))
self.check_equal_hash(-n, F(-n))
# random values of various sizes
for _ in range(1000):
e = random.randrange(300)
n = random.randrange(-10**e, 10**e)
self.check_equal_hash(n, D(n))
self.check_equal_hash(n, F(n))
if n == int(float(n)):
self.check_equal_hash(n, float(n))
def test_binary_floats(self):
# check that floats hash equal to corresponding Fractions and Decimals
# floats that are distinct but numerically equal should hash the same
self.check_equal_hash(0.0, -0.0)
# zeros
self.check_equal_hash(0.0, D(0))
self.check_equal_hash(-0.0, D(0))
self.check_equal_hash(-0.0, D('-0.0'))
self.check_equal_hash(0.0, F(0))
# infinities and nans
self.check_equal_hash(float('inf'), D('inf'))
self.check_equal_hash(float('-inf'), D('-inf'))
for _ in range(1000):
x = random.random() * math.exp(random.random()*200.0 - 100.0)
self.check_equal_hash(x, D.from_float(x))
self.check_equal_hash(x, F.from_float(x))
def test_complex(self):
# complex numbers with zero imaginary part should hash equal to
# the corresponding float
test_values = [0.0, -0.0, 1.0, -1.0, 0.40625, -5136.5,
float('inf'), float('-inf')]
for zero in -0.0, 0.0:
for value in test_values:
self.check_equal_hash(value, complex(value, zero))
def test_decimals(self):
# check that Decimal instances that have different representations
# but equal values give the same hash
zeros = ['0', '-0', '0.0', '-0.0e10', '000e-10']
for zero in zeros:
self.check_equal_hash(D(zero), D(0))
self.check_equal_hash(D('1.00'), D(1))
self.check_equal_hash(D('1.00000'), D(1))
self.check_equal_hash(D('-1.00'), D(-1))
self.check_equal_hash(D('-1.00000'), D(-1))
self.check_equal_hash(D('123e2'), D(12300))
self.check_equal_hash(D('1230e1'), D(12300))
self.check_equal_hash(D('12300'), D(12300))
self.check_equal_hash(D('12300.0'), D(12300))
self.check_equal_hash(D('12300.00'), D(12300))
self.check_equal_hash(D('12300.000'), D(12300))
def test_fractions(self):
# check special case for fractions where either the numerator
# or the denominator is a multiple of _PyHASH_MODULUS
self.assertEqual(hash(F(1, _PyHASH_MODULUS)), _PyHASH_INF)
self.assertEqual(hash(F(-1, 3*_PyHASH_MODULUS)), -_PyHASH_INF)
self.assertEqual(hash(F(7*_PyHASH_MODULUS, 1)), 0)
self.assertEqual(hash(F(-_PyHASH_MODULUS, 1)), 0)
def test_hash_normalization(self):
# Test for a bug encountered while changing long_hash.
#
# Given objects x and y, it should be possible for y's
# __hash__ method to return hash(x) in order to ensure that
# hash(x) == hash(y). But hash(x) is not exactly equal to the
# result of x.__hash__(): there's some internal normalization
# to make sure that the result fits in a C long, and is not
# equal to the invalid hash value -1. This internal
# normalization must therefore not change the result of
# hash(x) for any x.
class HalibutProxy:
def __hash__(self):
return hash('halibut')
def __eq__(self, other):
return other == 'halibut'
x = {'halibut', HalibutProxy()}
self.assertEqual(len(x), 1)
def test_main():
run_unittest(HashTest)
if __name__ == '__main__':
test_main()

View File

@ -426,6 +426,23 @@ class SysModuleTest(unittest.TestCase):
self.assertEqual(type(sys.int_info.bits_per_digit), int) self.assertEqual(type(sys.int_info.bits_per_digit), int)
self.assertEqual(type(sys.int_info.sizeof_digit), int) self.assertEqual(type(sys.int_info.sizeof_digit), int)
self.assertIsInstance(sys.hexversion, int) self.assertIsInstance(sys.hexversion, int)
self.assertEqual(len(sys.hash_info), 5)
self.assertLess(sys.hash_info.modulus, 2**sys.hash_info.width)
# sys.hash_info.modulus should be a prime; we do a quick
# probable primality test (doesn't exclude the possibility of
# a Carmichael number)
for x in range(1, 100):
self.assertEqual(
pow(x, sys.hash_info.modulus-1, sys.hash_info.modulus),
1,
"sys.hash_info.modulus {} is a non-prime".format(
sys.hash_info.modulus)
)
self.assertIsInstance(sys.hash_info.inf, int)
self.assertIsInstance(sys.hash_info.nan, int)
self.assertIsInstance(sys.hash_info.imag, int)
self.assertIsInstance(sys.maxsize, int) self.assertIsInstance(sys.maxsize, int)
self.assertIsInstance(sys.maxunicode, int) self.assertIsInstance(sys.maxunicode, int)
self.assertIsInstance(sys.platform, str) self.assertIsInstance(sys.platform, str)

View File

@ -12,6 +12,11 @@ What's New in Python 3.2 Alpha 1?
Core and Builtins Core and Builtins
----------------- -----------------
- Issue #8188: Introduce a new scheme for computing hashes of numbers
(instances of int, float, complex, decimal.Decimal and
fractions.Fraction) that makes it easy to maintain the invariant
that hash(x) == hash(y) whenever x and y have equal value.
- Issue #8748: Fix two issues with comparisons between complex and integer - Issue #8748: Fix two issues with comparisons between complex and integer
objects. (1) The comparison could incorrectly return True in some cases objects. (1) The comparison could incorrectly return True in some cases
(2**53+1 == complex(2**53) == 2**53), breaking transivity of equality. (2**53+1 == complex(2**53) == 2**53), breaking transivity of equality.

View File

@ -403,12 +403,12 @@ complex_str(PyComplexObject *v)
static long static long
complex_hash(PyComplexObject *v) complex_hash(PyComplexObject *v)
{ {
long hashreal, hashimag, combined; unsigned long hashreal, hashimag, combined;
hashreal = _Py_HashDouble(v->cval.real); hashreal = (unsigned long)_Py_HashDouble(v->cval.real);
if (hashreal == -1) if (hashreal == (unsigned long)-1)
return -1; return -1;
hashimag = _Py_HashDouble(v->cval.imag); hashimag = (unsigned long)_Py_HashDouble(v->cval.imag);
if (hashimag == -1) if (hashimag == (unsigned long)-1)
return -1; return -1;
/* Note: if the imaginary part is 0, hashimag is 0 now, /* Note: if the imaginary part is 0, hashimag is 0 now,
* so the following returns hashreal unchanged. This is * so the following returns hashreal unchanged. This is
@ -416,10 +416,10 @@ complex_hash(PyComplexObject *v)
* compare equal must have the same hash value, so that * compare equal must have the same hash value, so that
* hash(x + 0*j) must equal hash(x). * hash(x + 0*j) must equal hash(x).
*/ */
combined = hashreal + 1000003 * hashimag; combined = hashreal + _PyHASH_IMAG * hashimag;
if (combined == -1) if (combined == (unsigned long)-1)
combined = -2; combined = (unsigned long)-2;
return combined; return (long)combined;
} }
/* This macro may return! */ /* This macro may return! */

View File

@ -2571,18 +2571,37 @@ long_hash(PyLongObject *v)
sign = -1; sign = -1;
i = -(i); i = -(i);
} }
/* The following loop produces a C unsigned long x such that x is
congruent to the absolute value of v modulo ULONG_MAX. The
resulting x is nonzero if and only if v is. */
while (--i >= 0) { while (--i >= 0) {
/* Force a native long #-bits (32 or 64) circular shift */ /* Here x is a quantity in the range [0, _PyHASH_MODULUS); we
x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT); want to compute x * 2**PyLong_SHIFT + v->ob_digit[i] modulo
_PyHASH_MODULUS.
The computation of x * 2**PyLong_SHIFT % _PyHASH_MODULUS
amounts to a rotation of the bits of x. To see this, write
x * 2**PyLong_SHIFT = y * 2**_PyHASH_BITS + z
where y = x >> (_PyHASH_BITS - PyLong_SHIFT) gives the top
PyLong_SHIFT bits of x (those that are shifted out of the
original _PyHASH_BITS bits, and z = (x << PyLong_SHIFT) &
_PyHASH_MODULUS gives the bottom _PyHASH_BITS - PyLong_SHIFT
bits of x, shifted up. Then since 2**_PyHASH_BITS is
congruent to 1 modulo _PyHASH_MODULUS, y*2**_PyHASH_BITS is
congruent to y modulo _PyHASH_MODULUS. So
x * 2**PyLong_SHIFT = y + z (mod _PyHASH_MODULUS).
The right-hand side is just the result of rotating the
_PyHASH_BITS bits of x left by PyLong_SHIFT places; since
not all _PyHASH_BITS bits of x are 1s, the same is true
after rotation, so 0 <= y+z < _PyHASH_MODULUS and y + z is
the reduction of x*2**PyLong_SHIFT modulo
_PyHASH_MODULUS. */
x = ((x << PyLong_SHIFT) & _PyHASH_MODULUS) |
(x >> (_PyHASH_BITS - PyLong_SHIFT));
x += v->ob_digit[i]; x += v->ob_digit[i];
/* If the addition above overflowed we compensate by if (x >= _PyHASH_MODULUS)
incrementing. This preserves the value modulo x -= _PyHASH_MODULUS;
ULONG_MAX. */
if (x < v->ob_digit[i])
x++;
} }
x = x * sign; x = x * sign;
if (x == (unsigned long)-1) if (x == (unsigned long)-1)

View File

@ -647,63 +647,101 @@ PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
All the utility functions (_Py_Hash*()) return "-1" to signify an error. All the utility functions (_Py_Hash*()) return "-1" to signify an error.
*/ */
/* For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types.
A quick summary of the hashing strategy:
(1) First define the 'reduction of x modulo P' for any rational
number x; this is a standard extension of the usual notion of
reduction modulo P for integers. If x == p/q (written in lowest
terms), the reduction is interpreted as the reduction of p times
the inverse of the reduction of q, all modulo P; if q is exactly
divisible by P then define the reduction to be infinity. So we've
got a well-defined map
reduce : { rational numbers } -> { 0, 1, 2, ..., P-1, infinity }.
(2) Now for a rational number x, define hash(x) by:
reduce(x) if x >= 0
-reduce(-x) if x < 0
If the result of the reduction is infinity (this is impossible for
integers, floats and Decimals) then use the predefined hash value
_PyHASH_INF for x >= 0, or -_PyHASH_INF for x < 0, instead.
_PyHASH_INF, -_PyHASH_INF and _PyHASH_NAN are also used for the
hashes of float and Decimal infinities and nans.
A selling point for the above strategy is that it makes it possible
to compute hashes of decimal and binary floating-point numbers
efficiently, even if the exponent of the binary or decimal number
is large. The key point is that
reduce(x * y) == reduce(x) * reduce(y) (modulo _PyHASH_MODULUS)
provided that {reduce(x), reduce(y)} != {0, infinity}. The reduction of a
binary or decimal float is never infinity, since the denominator is a power
of 2 (for binary) or a divisor of a power of 10 (for decimal). So we have,
for nonnegative x,
reduce(x * 2**e) == reduce(x) * reduce(2**e) % _PyHASH_MODULUS
reduce(x * 10**e) == reduce(x) * reduce(10**e) % _PyHASH_MODULUS
and reduce(10**e) can be computed efficiently by the usual modular
exponentiation algorithm. For reduce(2**e) it's even better: since
P is of the form 2**n-1, reduce(2**e) is 2**(e mod n), and multiplication
by 2**(e mod n) modulo 2**n-1 just amounts to a rotation of bits.
*/
long long
_Py_HashDouble(double v) _Py_HashDouble(double v)
{ {
double intpart, fractpart; int e, sign;
int expo; double m;
long hipart; unsigned long x, y;
long x; /* the final hash value */
/* This is designed so that Python numbers of different types
* that compare equal hash to the same value; otherwise comparisons
* of mapping keys will turn out weird.
*/
if (!Py_IS_FINITE(v)) { if (!Py_IS_FINITE(v)) {
if (Py_IS_INFINITY(v)) if (Py_IS_INFINITY(v))
return v < 0 ? -271828 : 314159; return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
else else
return 0; return _PyHASH_NAN;
} }
fractpart = modf(v, &intpart);
if (fractpart == 0.0) { m = frexp(v, &e);
/* This must return the same hash as an equal int or long. */
if (intpart > LONG_MAX/2 || -intpart > LONG_MAX/2) { sign = 1;
/* Convert to long and use its hash. */ if (m < 0) {
PyObject *plong; /* converted to Python long */ sign = -1;
plong = PyLong_FromDouble(v); m = -m;
if (plong == NULL)
return -1;
x = PyObject_Hash(plong);
Py_DECREF(plong);
return x;
}
/* Fits in a C long == a Python int, so is its own hash. */
x = (long)intpart;
if (x == -1)
x = -2;
return x;
} }
/* The fractional part is non-zero, so we don't have to worry about
* making this match the hash of some other type. /* process 28 bits at a time; this should work well both for binary
* Use frexp to get at the bits in the double. and hexadecimal floating point. */
* Since the VAX D double format has 56 mantissa bits, which is the x = 0;
* most of any double format in use, each of these parts may have as while (m) {
* many as (but no more than) 56 significant bits. x = ((x << 28) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - 28);
* So, assuming sizeof(long) >= 4, each part can be broken into two m *= 268435456.0; /* 2**28 */
* longs; frexp and multiplication are used to do that. e -= 28;
* Also, since the Cray double format has 15 exponent bits, which is y = (unsigned long)m; /* pull out integer part */
* the most of any double format in use, shifting the exponent field m -= y;
* left by 15 won't overflow a long (again assuming sizeof(long) >= 4). x += y;
*/ if (x >= _PyHASH_MODULUS)
v = frexp(v, &expo); x -= _PyHASH_MODULUS;
v *= 2147483648.0; /* 2**31 */ }
hipart = (long)v; /* take the top 32 bits */
v = (v - (double)hipart) * 2147483648.0; /* get the next 32 bits */ /* adjust for the exponent; first reduce it modulo _PyHASH_BITS */
x = hipart + (long)v + (expo << 15); e = e >= 0 ? e % _PyHASH_BITS : _PyHASH_BITS-1-((-1-e) % _PyHASH_BITS);
if (x == -1) x = ((x << e) & _PyHASH_MODULUS) | x >> (_PyHASH_BITS - e);
x = -2;
return x; x = x * sign;
if (x == (unsigned long)-1)
x = (unsigned long)-2;
return (long)x;
} }
long long

View File

@ -4921,6 +4921,7 @@ slot_tp_hash(PyObject *self)
PyObject *func, *res; PyObject *func, *res;
static PyObject *hash_str; static PyObject *hash_str;
long h; long h;
int overflow;
func = lookup_method(self, "__hash__", &hash_str); func = lookup_method(self, "__hash__", &hash_str);
@ -4937,14 +4938,27 @@ slot_tp_hash(PyObject *self)
Py_DECREF(func); Py_DECREF(func);
if (res == NULL) if (res == NULL)
return -1; return -1;
if (PyLong_Check(res))
if (!PyLong_Check(res)) {
PyErr_SetString(PyExc_TypeError,
"__hash__ method should return an integer");
return -1;
}
/* Transform the PyLong `res` to a C long `h`. For an existing
hashable Python object x, hash(x) will always lie within the range
of a C long. Therefore our transformation must preserve values
that already lie within this range, to ensure that if x.__hash__()
returns hash(y) then hash(x) == hash(y). */
h = PyLong_AsLongAndOverflow(res, &overflow);
if (overflow)
/* res was not within the range of a C long, so we're free to
use any sufficiently bit-mixing transformation;
long.__hash__ will do nicely. */
h = PyLong_Type.tp_hash(res); h = PyLong_Type.tp_hash(res);
else
h = PyLong_AsLong(res);
Py_DECREF(res); Py_DECREF(res);
if (h == -1 && !PyErr_Occurred()) if (h == -1 && !PyErr_Occurred())
h = -2; h = -2;
return h; return h;
} }
static PyObject * static PyObject *

View File

@ -570,6 +570,57 @@ sys_setrecursionlimit(PyObject *self, PyObject *args)
return Py_None; return Py_None;
} }
static PyTypeObject Hash_InfoType;
PyDoc_STRVAR(hash_info_doc,
"hash_info\n\
\n\
A struct sequence providing parameters used for computing\n\
numeric hashes. The attributes are read only.");
static PyStructSequence_Field hash_info_fields[] = {
{"width", "width of the type used for hashing, in bits"},
{"modulus", "prime number giving the modulus on which the hash "
"function is based"},
{"inf", "value to be used for hash of a positive infinity"},
{"nan", "value to be used for hash of a nan"},
{"imag", "multiplier used for the imaginary part of a complex number"},
{NULL, NULL}
};
static PyStructSequence_Desc hash_info_desc = {
"sys.hash_info",
hash_info_doc,
hash_info_fields,
5,
};
PyObject *
get_hash_info(void)
{
PyObject *hash_info;
int field = 0;
hash_info = PyStructSequence_New(&Hash_InfoType);
if (hash_info == NULL)
return NULL;
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(8*sizeof(long)));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_MODULUS));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_INF));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_NAN));
PyStructSequence_SET_ITEM(hash_info, field++,
PyLong_FromLong(_PyHASH_IMAG));
if (PyErr_Occurred()) {
Py_CLEAR(hash_info);
return NULL;
}
return hash_info;
}
PyDoc_STRVAR(setrecursionlimit_doc, PyDoc_STRVAR(setrecursionlimit_doc,
"setrecursionlimit(n)\n\ "setrecursionlimit(n)\n\
\n\ \n\
@ -1482,6 +1533,11 @@ _PySys_Init(void)
PyFloat_GetInfo()); PyFloat_GetInfo());
SET_SYS_FROM_STRING("int_info", SET_SYS_FROM_STRING("int_info",
PyLong_GetInfo()); PyLong_GetInfo());
/* initialize hash_info */
if (Hash_InfoType.tp_name == 0)
PyStructSequence_InitType(&Hash_InfoType, &hash_info_desc);
SET_SYS_FROM_STRING("hash_info",
get_hash_info());
SET_SYS_FROM_STRING("maxunicode", SET_SYS_FROM_STRING("maxunicode",
PyLong_FromLong(PyUnicode_GetMax())); PyLong_FromLong(PyUnicode_GetMax()));
SET_SYS_FROM_STRING("builtin_module_names", SET_SYS_FROM_STRING("builtin_module_names",