Introduction

Application profiling is a process of analyzing a program to determine its characteristics: execution time of different code parts and resource usage.

The main stages of profiling are always more or less the same:

  1. Measuring the execution time. How much time is required to execute different code parts?
  2. Analyzing memory usage. How much memory is consumed by different parts of the program?
  3. Identifying bottlenecks. What parts of the code slow down the program or use too many resources?
  4. Performance optimization. Taking measures to improve execution speed and resource utilization efficiency based on the obtained data.

There are a finite number of specific bottlenecks for asynchronous code that are better listed up front.

Let’s match each type with a code example.

--

The main types of bottlenecks in asynchronous Python

Blocking operations

import asyncio
import time

async def main():
    print('Start')
    # Blocking call
    time.sleep(3)  # This blocks the entire event loop
    print('End')

asyncio.run(main())

Calling asynchronous tasks sequentially

import asyncio
import aiohttp

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = ["http://medium.com"] * 10
    async with aiohttp.ClientSession() as session:
        # Inefficient: Sequential requests
        for url in urls:
            await fetch(session, url)

asyncio.run(main())

Excessive Context Switching

import asyncio

async def tiny_task():
    await asyncio.sleep(0.0001)

async def main():
    # Excessive context switching due to many small tasks
    await asyncio.gather(*(tiny_task() for _ in range(100000)))

asyncio.run(main())

Resource Starvation

import asyncio

async def long_running_task():
    await asyncio.sleep(10)
    print("Long task executed")

async def quick_task():
 await asyncio.sleep(1)
    print("Quick task executed")

async def main():
    await asyncio.gather(
        long_running_task(),
        quick_task()  # May be delayed excessively
    )

asyncio.run(main())

Memory Overhead

import asyncio

async def large_data_task():
    data = "lorep ipsum" * 10**8  # Large memory usage
    await asyncio.sleep(1)

async def main():
    tasks = [large_data_task() for _ in range(100)]  # High memory consumption
    await asyncio.gather(*tasks)

asyncio.run(main())

--

By the way, how does the profiler work in general?

A separate article will be devoted to a detailed review, for now we can limit ourselves to a basic classification:

  • Deterministic profilers. The main representative is the built-in cProfile. This profiler counts the number of calls of each function and the time spent by the function. The problem is that the waiting time of asynchronous calls is not taken into account.
  • Statistical profilers. Common representatives are scalene, py-spy, yappi, pyinstrument, austin. Such profilers take a “snapshot” of the process with some frequency and apply methods of statistical analysis to search for bottlenecks.

--

Using scalene for profiling

Why scalene? Because this tool allows profiling both CPU and memory, has 10k+ stars on github, and the project is actively developing.

Let’s see what scalene says for each “problematic” code from the list above.

We will run scalene like this:

scalene --cpu --memory --cli script_name.py

Blocking operations

You can see the problem line with blocking calls immediately — 2% of the time in Python, 98% of the time in system calls.

Calling asynchronous tasks sequentially

Here it is a bit more complicated. You can see that 90% of the time is spent on system calls, but the line has changed — now it is asyncio.run() itself. This pattern of profiler output is best just memorized.

Excessive Context Switching

We see how memory consumption grows in asyncio.gather() — the “splitting” of tasks is too greedy.

Resource Starvation

Once again, the system vs python time ratio is not in favor of python operations.

Memory Overhead

Here scalene did everything for us and showed us the problematic code immediately.

--

Conclusion

It should be noted that for three cases — “blocking operations”, “calling asynchronous tasks sequentially” and “resource starvation” the profiler showed us the same picture — system % >> python %. Clarification the cause requires, in fact, a developer.

Profiling Python is not a difficult and quite pleasant task, if you know the main types of bottlenecks and are ready to read the profiler output carefully

P.S.

This post was originally posted in my medium blog more than a year ago.

Author Of article : Maksim Smirnov Read full article