Skip to main content

Fetching with Retries

·386 words·2 mins
Posts async http python
zd
Author
zd
cli-geek, strategist
Table of Contents
AsyncIO - This article is part of a series.
Part 7: This Article

Here’s a practical example to show how to fetch multiple pages of JSON from an API.

It can retries if there is HTTP request (network) error or hitting rate limitation.

Pseudo Code
#

The code snippets here combine the following:

  • Fetches pages asynchronously with asyncio.
  • Uses HTTP/2 instead of HTTP/1.1 with httpx module.
  • Uses HTTP connection pooling (with client()) for better efficiency in using of network resources. It will reuse the underlying TCP connection instead of recreating one for every request.
  • Uses semaphore to limit the conncurrent connections at a time.
  • Retries the fetching (via recursive call) if there is network error (HTTPRequestError).
  • Delay and retries the fetching (via recursive call) if there is rate limit error (HTTPStatusError 429).
  • Track the progress and % of successful fetched pages.
  • Once completed, it will check status code (in the HTTP response) to ensure all pages are return correctly (HTTP 200).

Code Snippets
#

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from rich import print as rprint
import httpx
import asyncio

http200 = 0
retries = 0
pages = 0

def try_again(headers):
    try:
        return int(headers.get('Retry-After',0))
    except ValueError:
        return None

async def fetching(client,url,sem):
    global http200
    global retries
    global pages
    backoff = 0.2
    total = pages

    try:
        async with sem:
            resp = await client.get(url)
            resp.raise_for_status()

    except httpx.RequestError:
        await asyncio.sleep(backoff)
        return await fetching(client,url,sem) # Return the result of the retry

    except httpx.HTTPStatusError as err:
        if err.response.status_code == 429:
            retry_after = try_again(err.response.headers)
            if retry_after:
                rprint(f' [-] Rate limit reached for {url}. Waiting {retry_after} before retry...')
                await asyncio.sleep(retry_after + backoff)
            else:
                rprint(f' [-] Waiting {backoff} before retry...')
                await asyncio.sleep(backoff)
            retries += 1
            backoff *= 2
            return await fetching(client,url,sem) # Return the result of the retry
        else:
            rprint(f' [!] [{resp.status_code}] {url}')

    else:
        if resp and resp.status_code == 200:
            http200 += 1
            rprint(f' [[green]{resp.status_code}[/green]]  [ [magenta]{http200}/{total}[/magenta] ({http200/total*100:.2f}%) ]')
            return resp


async def main(links):
    limit = 5
    sem = asyncio.Semaphore(limit)
    hdrs = { 'accept': 'application/json', 'content-type': 'application/json' }

    async with httpx.AsyncClient(headers=hdrs, http2=True) as client:
        tasks = [ fetching(client,link,sem) for link in links ]
        responses = await asyncio.gather(*tasks)

    print(f'')
    fail = 1 if any(r is None or r.status_code != 200 for r in responses) else 0

    if not fail:
        print(f'All [{len(responses)} responses] are OK.')
        print(responses)
    else:
        print(f'Errors: Not all {responses} are OK.')



links = [] # A list that contains long list of URL to be fetched.
asyncio.run(main(links))
AsyncIO - This article is part of a series.
Part 7: This Article

Related

About Concurrent.futures()
·630 words·3 mins
Posts 101 async concurrent http python thread
Run numerous tasks concurrently via multi-thread and multi-process.
Learn AsyncIO by Practices
·755 words·4 mins
Essential Posts async http python
Next, let’s practice AsyncIO with different models.
Understanding AsyncIO by Examples
·584 words·3 mins
Posts async http python
Now, let’s learn AsyncIO by examples.