Fetching with Retries

Table of Contents

AsyncIO - This article is part of a series.

Part 1: Understanding AsyncIO by Code

Part 2: Understanding AsyncIO by Examples

Part 3: Learn AsyncIO by Practices

Part 6: About Concurrent.futures()

Part 7: This Article

Here’s a practical example to show how to fetch multiple pages of JSON from an API.

It can retries if there is HTTP request (network) error or hitting rate limitation.

Pseudo Code
#

The code snippets here combine the following:

Fetches pages asynchronously with asyncio.
Uses HTTP/2 instead of HTTP/1.1 with httpx module.
Uses HTTP connection pooling (with client()) for better efficiency in using of network resources. It will reuse the underlying TCP connection instead of recreating one for every request.
Uses semaphore to limit the conncurrent connections at a time.
Retries the fetching (via recursive call) if there is network error (HTTPRequestError).
Delay and retries the fetching (via recursive call) if there is rate limit error (HTTPStatusError 429).
Track the progress and % of successful fetched pages.
Once completed, it will check status code (in the HTTP response) to ensure all pages are return correctly (HTTP 200).

Code Snippets
#

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from rich import print as rprint
import httpx
import asyncio

http200 = 0
retries = 0
pages = 0

def try_again(headers):
    try:
        return int(headers.get('Retry-After',0))
    except ValueError:
        return None

async def fetching(client,url,sem):
    global http200
    global retries
    global pages
    backoff = 0.2
    total = pages

    try:
        async with sem:
            resp = await client.get(url)
            resp.raise_for_status()

    except httpx.RequestError:
        await asyncio.sleep(backoff)
        return await fetching(client,url,sem) # Return the result of the retry

    except httpx.HTTPStatusError as err:
        if err.response.status_code == 429:
            retry_after = try_again(err.response.headers)
            if retry_after:
                rprint(f' [-] Rate limit reached for {url}. Waiting {retry_after} before retry...')
                await asyncio.sleep(retry_after + backoff)
            else:
                rprint(f' [-] Waiting {backoff} before retry...')
                await asyncio.sleep(backoff)
            retries += 1
            backoff *= 2
            return await fetching(client,url,sem) # Return the result of the retry
        else:
            rprint(f' [!] [{resp.status_code}] {url}')

    else:
        if resp and resp.status_code == 200:
            http200 += 1
            rprint(f' [[green]{resp.status_code}[/green]]  [ [magenta]{http200}/{total}[/magenta] ({http200/total*100:.2f}%) ]')
            return resp


async def main(links):
    limit = 5
    sem = asyncio.Semaphore(limit)
    hdrs = { 'accept': 'application/json', 'content-type': 'application/json' }

    async with httpx.AsyncClient(headers=hdrs, http2=True) as client:
        tasks = [ fetching(client,link,sem) for link in links ]
        responses = await asyncio.gather(*tasks)

    print(f'')
    fail = 1 if any(r is None or r.status_code != 200 for r in responses) else 0

    if not fail:
        print(f'All [{len(responses)} responses] are OK.')
        print(responses)
    else:
        print(f'Errors: Not all {responses} are OK.')



links = [] # A list that contains long list of URL to be fetched.
asyncio.run(main(links))