Computational Biology 2025
Ghent University
Ghent University
2025-02-21
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
– Donald Knuth in Structured Programming with go to Statements (1974)
dict instead of listmultiprocessing
xargs or GNU parallel can be fasterBLAS, numba, cython, pycudanumpy views, pandas Copy-on-Writetime.perf_counter_ns before and after a function
Prefer timeit for small code snippets
python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 30.2 usec per loopVisualize snakeviz
python -m cProfile -o results.prof myscript.py
snakeviz results.profIcicle example graph of profiled function calls. Functions are sorted alphabetically in X. Wider in X means the function was sampled more and uses more resources. Y is depth in the stack frame, at the top is the highest calling function and at the bottom are the lowest functions.
python -m memory_profiler example.py
Line #    Mem usage    Increment  Occurrences   Line Contents
============================================================
     3   38.816 MiB   38.816 MiB           1   @profile
     4                                         def my_func():
     5   46.492 MiB    7.676 MiB           1       a = [1] * (10 ** 6)
     6  199.117 MiB  152.625 MiB           1       b = [2] * (2 * 10 ** 7)
     7   46.629 MiB -152.488 MiB           1       del b
     8   46.629 MiB    0.000 MiB           1       return aviz in Excel, matplotlib or kcachegrind (Linux)
🕵️♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers.
ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results.
🏎 Blazing fast! Profiling slows the application only slightly. Tracking native code is somewhat slower, but this can be enabled or disabled on demand.
📈 It can generate various reports about the collected memory usage data, like flame graphs.
🧵 Works with Python threads.
👽🧵 Works with native-threads (e.g. C++ threads in C extensions).
import yappi
import time
import threading
_NTHREAD = 3
def _work(n):
    time.sleep(n * 0.1)
yappi.start()
threads = []
# generate _NTHREAD threads
for i in range(_NTHREAD):
    t = threading.Thread(target=_work, args=(i + 1, ))
    t.start()
    threads.append(t)
# wait all threads to finish
for t in threads:
    t.join()
yappi.stop()
# retrieve thread stats by their thread id (given by yappi)
threads = yappi.get_thread_stats()
for thread in threads:
    print(
        "Function stats for (%s) (%d)" % (thread.name, thread.id)
    )  # it is the Thread.__class__.__name__
    yappi.get_func_stats(ctx_id=thread.id).print_all()Function stats for (Thread) (3)
name                                  ncall  tsub      ttot      tavg
..hon3.7/threading.py:859 Thread.run  1      0.000017  0.000062  0.000062
doc3.py:8 _work                       1      0.000012  0.000045  0.000045
Function stats for (Thread) (2)
name                                  ncall  tsub      ttot      tavg
..hon3.7/threading.py:859 Thread.run  1      0.000017  0.000065  0.000065
doc3.py:8 _work                       1      0.000010  0.000048  0.000048
Function stats for (Thread) (1)
name                                  ncall  tsub      ttot      tavg
..hon3.7/threading.py:859 Thread.run  1      0.000010  0.000043  0.000043
doc3.py:8 _work                       1      0.000006  0.000033  0.000033time.perf_counter_ns before and after a function
Example of a funkyheatmap. Other metrics are also taken into account such as scalability, stability, usability…
Using time builtin is very limited
sleep 0.01  0.00s user 0.00s system 11% cpu 0.020 totalKeeping track of time in a bash script has overhead and limited precision
start=`date +%s.%N`
sleep 0.01
end=`date +%s.%N`
runtime=$( echo "$end - $start" | bc -l )
echo $runtime.025655000Good benchmark frameworks run the code multiple times and report statistics
Benchmark 1: sleep 0.01
  Time (mean ± σ):      16.0 ms ±   1.8 ms    [User: 0.3 ms, System: 0.8 ms]
  Range (min … max):    13.7 ms …  18.3 ms    5 runsSome older example code is available in at https://github.com/saeyslab/hydra_hpc_example: