Computational Biology 2024
Ghent University
2024-02-21
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
– Donald Knuth in Structured Programming with go to Statements (1974)
dict
instead of list
multiprocessing
xargs
or GNU parallel
can be fasterBLAS
, numba
, cython
, pycuda
numpy
views, pandas Copy-on-Writetime.perf_counter_ns
before and after a function
Prefer timeit for small code snippets
python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 30.2 usec per loop
Example of a funkyheatmap. Other metrics are also taken into account such as scalability, stability, usability…
Visualize snakeviz
python -m cProfile -o results.prof myscript.py
snakeviz results.prof
Icicle example graph of profiled function calls. Functions are sorted alphabetically in X. Wider in X means the function was sampled more and uses more resources. Y is depth in the stack frame, at the top is the highest calling function and at the bottom are the lowest functions.
Using time builtin is very limited
sleep 0.01 0.00s user 0.00s system 11% cpu 0.020 total
Keeping track of time in a bash script has overhead and limited precision
start=`date +%s.%N`
sleep 0.01
end=`date +%s.%N`
runtime=$( echo "$end - $start" | bc -l )
echo $runtime
.025655000
Good profiling frameworks run the code multiple times and report statistics
Benchmark 1: sleep 0.01
Time (mean ± σ): 16.0 ms ± 1.8 ms [User: 0.3 ms, System: 0.8 ms]
Range (min … max): 13.7 ms … 18.3 ms 5 runs
import yappi
import time
import threading
_NTHREAD = 3
def _work(n):
time.sleep(n * 0.1)
yappi.start()
threads = []
# generate _NTHREAD threads
for i in range(_NTHREAD):
t = threading.Thread(target=_work, args=(i + 1, ))
t.start()
threads.append(t)
# wait all threads to finish
for t in threads:
t.join()
yappi.stop()
# retrieve thread stats by their thread id (given by yappi)
threads = yappi.get_thread_stats()
for thread in threads:
print(
"Function stats for (%s) (%d)" % (thread.name, thread.id)
) # it is the Thread.__class__.__name__
yappi.get_func_stats(ctx_id=thread.id).print_all()
Function stats for (Thread) (3)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000017 0.000062 0.000062
doc3.py:8 _work 1 0.000012 0.000045 0.000045
Function stats for (Thread) (2)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000017 0.000065 0.000065
doc3.py:8 _work 1 0.000010 0.000048 0.000048
Function stats for (Thread) (1)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000010 0.000043 0.000043
doc3.py:8 _work 1 0.000006 0.000033 0.000033
python -m memory_profiler example.py
Line # Mem usage Increment Occurrences Line Contents
============================================================
3 38.816 MiB 38.816 MiB 1 @profile
4 def my_func():
5 46.492 MiB 7.676 MiB 1 a = [1] * (10 ** 6)
6 199.117 MiB 152.625 MiB 1 b = [2] * (2 * 10 ** 7)
7 46.629 MiB -152.488 MiB 1 del b
8 46.629 MiB 0.000 MiB 1 return a
viz in Excel, matplotlib or kcachegrind (Linux)
image
🕵️♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers.
ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results.
🏎 Blazing fast! Profiling slows the application only slightly. Tracking native code is somewhat slower, but this can be enabled or disabled on demand.
📈 It can generate various reports about the collected memory usage data, like flame graphs.
🧵 Works with Python threads.
👽🧵 Works with native-threads (e.g. C++ threads in C extensions).
conda create -n myenv python=3.8
~/bin/runner.pbs
Method 1)
in the Custom Code section, activate the environment with modules or the runner.pbs scriptExample code is available in at https://github.com/saeyslab/hydra_hpc_example. frequencies_hydra is the most high-level example and the easiest to use.