:mod:`packaging.pypi.simple` --- Crawler using the PyPI "simple" interface ========================================================================== .. module:: packaging.pypi.simple :synopsis: Crawler using the screen-scraping "simple" interface to fetch info and distributions. `packaging.pypi.simple` can process Python Package Indexes and provides useful information about distributions. It also can crawl local indexes, for instance. You should use `packaging.pypi.simple` for: * Search distributions by name and versions. * Process index external pages. * Download distributions by name and versions. And should not be used for: * Things that will end up in too long index processing (like "finding all distributions with a specific version, no matters the name") API --- .. class:: Crawler Usage Exemples --------------- To help you understand how using the `Crawler` class, here are some basic usages. Request the simple index to get a specific distribution ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Supposing you want to scan an index to get a list of distributions for the "foobar" project. You can use the "get_releases" method for that. The get_releases method will browse the project page, and return :class:`ReleaseInfo` objects for each found link that rely on downloads. :: >>> from packaging.pypi.simple import Crawler >>> crawler = Crawler() >>> crawler.get_releases("FooBar") [, ] Note that you also can request the client about specific versions, using version specifiers (described in `PEP 345 `_):: >>> client.get_releases("FooBar < 1.2") [, ] `get_releases` returns a list of :class:`ReleaseInfo`, but you also can get the best distribution that fullfil your requirements, using "get_release":: >>> client.get_release("FooBar < 1.2") Download distributions ^^^^^^^^^^^^^^^^^^^^^^ As it can get the urls of distributions provided by PyPI, the `Crawler` client also can download the distributions and put it for you in a temporary destination:: >>> client.download("foobar") /tmp/temp_dir/foobar-1.2.tar.gz You also can specify the directory you want to download to:: >>> client.download("foobar", "/path/to/my/dir") /path/to/my/dir/foobar-1.2.tar.gz While downloading, the md5 of the archive will be checked, if not matches, it will try another time, then if fails again, raise `MD5HashDoesNotMatchError`. Internally, that's not the Crawler which download the distributions, but the `DistributionInfo` class. Please refer to this documentation for more details. Following PyPI external links ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The default behavior for packaging is to *not* follow the links provided by HTML pages in the "simple index", to find distributions related downloads. It's possible to tell the PyPIClient to follow external links by setting the `follow_externals` attribute, on instantiation or after:: >>> client = Crawler(follow_externals=True) or :: >>> client = Crawler() >>> client.follow_externals = True Working with external indexes, and mirrors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The default `Crawler` behavior is to rely on the Python Package index stored on PyPI (http://pypi.python.org/simple). As you can need to work with a local index, or private indexes, you can specify it using the index_url parameter:: >>> client = Crawler(index_url="file://filesystem/path/") or :: >>> client = Crawler(index_url="http://some.specific.url/") You also can specify mirrors to fallback on in case the first index_url you provided doesnt respond, or not correctly. The default behavior for `Crawler` is to use the list provided by Python.org DNS records, as described in the :PEP:`381` about mirroring infrastructure. If you don't want to rely on these, you could specify the list of mirrors you want to try by specifying the `mirrors` attribute. It's a simple iterable:: >>> mirrors = ["http://first.mirror","http://second.mirror"] >>> client = Crawler(mirrors=mirrors) Searching in the simple index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It's possible to search for projects with specific names in the package index. Assuming you want to find all projects containing the "distutils" keyword:: >>> c.search_projects("distutils") [, , , , , , ] You can also search the projects starting with a specific text, or ending with that text, using a wildcard:: >>> c.search_projects("distutils*") [, , ] >>> c.search_projects("*distutils") [, , , , ]