Computational Biology 2024
Ghent University
2024-02-21
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
– Donald Knuth in Structured Programming with go to Statements (1974)
dict
instead of list
multiprocessing
xargs
or GNU parallel
can be fasterBLAS
, numba
, cython
, pycuda
numpy
views, pandas Copy-on-Writetime.perf_counter_ns
before and after a function
Prefer timeit for small code snippets
python -m timeit '"-".join(str(n) for n in range(100))'
10000 loops, best of 5: 30.2 usec per loop
Visualize snakeviz
python -m cProfile -o results.prof myscript.py
snakeviz results.prof
Using time builtin is very limited
sleep 0.01 0.00s user 0.00s system 11% cpu 0.020 total
Keeping track of time in a bash script has overhead and limited precision
start=`date +%s.%N`
sleep 0.01
end=`date +%s.%N`
runtime=$( echo "$end - $start" | bc -l )
echo $runtime
.025655000
Good profiling frameworks run the code multiple times and report statistics
Benchmark 1: sleep 0.01
Time (mean ± σ): 16.0 ms ± 1.8 ms [User: 0.3 ms, System: 0.8 ms]
Range (min … max): 13.7 ms … 18.3 ms 5 runs
import yappi
import time
import threading
_NTHREAD = 3
def _work(n):
time.sleep(n * 0.1)
yappi.start()
threads = []
# generate _NTHREAD threads
for i in range(_NTHREAD):
t = threading.Thread(target=_work, args=(i + 1, ))
t.start()
threads.append(t)
# wait all threads to finish
for t in threads:
t.join()
yappi.stop()
# retrieve thread stats by their thread id (given by yappi)
threads = yappi.get_thread_stats()
for thread in threads:
print(
"Function stats for (%s) (%d)" % (thread.name, thread.id)
) # it is the Thread.__class__.__name__
yappi.get_func_stats(ctx_id=thread.id).print_all()
Function stats for (Thread) (3)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000017 0.000062 0.000062
doc3.py:8 _work 1 0.000012 0.000045 0.000045
Function stats for (Thread) (2)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000017 0.000065 0.000065
doc3.py:8 _work 1 0.000010 0.000048 0.000048
Function stats for (Thread) (1)
name ncall tsub ttot tavg
..hon3.7/threading.py:859 Thread.run 1 0.000010 0.000043 0.000043
doc3.py:8 _work 1 0.000006 0.000033 0.000033
python -m memory_profiler example.py
Line # Mem usage Increment Occurrences Line Contents
============================================================
3 38.816 MiB 38.816 MiB 1 @profile
4 def my_func():
5 46.492 MiB 7.676 MiB 1 a = [1] * (10 ** 6)
6 199.117 MiB 152.625 MiB 1 b = [2] * (2 * 10 ** 7)
7 46.629 MiB -152.488 MiB 1 del b
8 46.629 MiB 0.000 MiB 1 return a
viz in Excel, matplotlib or kcachegrind (Linux)
🕵️♀️ Traces every function call so it can accurately represent the call stack, unlike sampling profilers.
ℭ Also handles native calls in C/C++ libraries so the entire call stack is present in the results.
🏎 Blazing fast! Profiling slows the application only slightly. Tracking native code is somewhat slower, but this can be enabled or disabled on demand.
📈 It can generate various reports about the collected memory usage data, like flame graphs.
🧵 Works with Python threads.
👽🧵 Works with native-threads (e.g. C++ threads in C extensions).
conda create -n myenv python=3.8
~/bin/runner.pbs
Method 1)
in the Custom Code section, activate the environment with modules or the runner.pbs scriptExample code is available in at https://github.com/saeyslab/hydra_hpc_example. frequencies_hydra is the most high-level example and the easiest to use.