Python Profiling #

在做任何優化前，應該先找出哪一行程式是瓶頸，從那裡開始優化最有效率。本文會先介紹常見的 Python Profiling 工具。

Timeit #

在 IPython 的環境下可以直接使用 %timeit。

In [1]: import numpy as np
In [2]: def pow2(a):
            return a ** 2

        a = np.arange(10000000)
        %timeit pow2(a)
The slowest run took 30.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10 loops, best of 3: 21.6 ms per loop

cProfile #

cProfile 是 Python 內建的 profiler，由 C 輔助寫成。如果沒有 cProfile 的話，可以用 Profile，是純 Python 的 module，行為跟 cProile 一樣，但是 overhead 很大。意思是有 profiler 的情況下速度會慢很多，但每個指令間的相對速度還是可以參考的。跑出來的結果可以用 pstats 視覺化。

import cProfile
import numpy as np
import pstats

def pow2(a):
    return a ** 2

a = np.arange(10000000)
cProfile.run('pow2(a)', filename="result.out")  # profile pow2(a)，並且將結果寫入 result.out

p = pstats.Stats("result.out")
print(p.strip_dirs().sort_stats("tottime").print_stats())

result.out 是 binary 格式，所以必須用 pstats 讀取。上面那段程式會印出以下內容：

Wed Apr 22 22:06:23 2020    result.out

         4 function calls in 0.026 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.023    0.023    0.023    0.023 <ipython-input-3-d97d0fd3b0ce>:1(pow2)
        1    0.003    0.003    0.026    0.026 <string>:1(<module>)
        1    0.000    0.000    0.026    0.026 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

<pstats.Stats object at 0x0000000005757240>

第二行表示記錄了四個 function call。接下來的 Ordered by 是下面表格排序的欄位。

ncalls: 呼叫次數
tottime: 該 function 所花的時間(不包含裡面再呼叫的 function)
percall: tottime / ncalls
cumtime: 該 function 與裡面再呼叫的 function 所累積的時間。
percall: cumtime / primitive calls
filename:lineno(function): 這行是哪個 function 的時間。

也可以在終端機執行 cProfile (或 Profile)。寫好 test.py 如下：

import numpy as np

def pow2(a):
    return a ** 2

a = np.arange(10000000)
pow2(a)

終端機輸入：python -m cProfile -o result.out test.py

使用以下程式碼讀取 result.out 內容：

import pstats
p = pstats.Stats("result.out")
print(p.strip_dirs().sort_stats("tottime").print_stats(10))  # 10 表示印出前 10 行

Wed Apr 22 22:18:21 2020    result.out

         51062 function calls (48879 primitive calls) in 0.161 seconds

   Ordered by: internal time
   List reduced from 943 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      608    0.020    0.000    0.020    0.000 {built-in method nt.stat}
        1    0.020    0.020    0.020    0.020 test.py:3(pow2)
        1    0.016    0.016    0.016    0.016 {built-in method numpy.core.multiarray.arange}
      124    0.016    0.000    0.016    0.000 {built-in method marshal.loads}
       11    0.012    0.001    0.015    0.001 {built-in method _imp.create_dynamic}
  342/340    0.005    0.000    0.009    0.000 {built-in method builtins.__build_class__}
      124    0.005    0.000    0.009    0.000 <frozen importlib._bootstrap_external>:816(get_data)
    134/1    0.005    0.000    0.162    0.162 {built-in method builtins.exec}
      124    0.004    0.000    0.004    0.000 {method 'read' of '_io.FileIO' objects}
      280    0.003    0.000    0.003    0.000 {built-in method winreg.OpenKey}

<pstats.Stats object at 0x0000000004B93A20>

其中 ncalls 有兩個數字，如第六行的 342/340，表示這是一個遞迴函式，340 表示 primitive call的次數，342 為所有 function call 的次數。

詳細內容可以參考官方文件。

Line Profiler #

Line Profiler 提供逐行的程式執行時間，缺點是要在需要 profiling 的函式加上 decorater @profile，以及 overhead 可能有一點大。

安裝：

pip install line_profiler

注意：使用 Windows 可能會遇到一些問題，可以在這邊下載非官方預先編譯的 wheel 檔案，直接 pip install line_profiler‑*.whl 即可。注意對應的 OS 及 Python 版本。如果是 64 bit OS (現在應該都是 64 bit OS 了)且使用 Python 3.5.2，請下載 line_profiler‑2.1.2‑cp35‑cp35m‑win_amd64.whl，其中 2.1.2 是 Line Profiler 的版本。

寫好 test.py 如下：

import numpy as np

@profile
def pow2(a):
    return a ** 2

@profile
def main():
    a = np.arange(10000000)
    pow2(a)

if __name__ == '__main__':
    main()

在終端機執行

kernprof -l test.py                    # 做 profiling 會輸出結果至 test.py.lprof
python -m line_profiler test.py.lprof  # 讀取 test.py.lprof 結果

結果如下：

Timer unit: 3.20738e-07 s

Total time: 0.0177429 s
File: test.py
Function: pow2 at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def pow2(a):
     5         1      55319.0  55319.0    100.0      return a ** 2

Total time: 0.0345053 s
File: test.py
Function: main at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           @profile
     8                                           def main():
     9         1      44137.0  44137.0     41.0      a = np.arange(10000000)
    10         1      63444.0  63444.0     59.0      pow2(a)

記憶體的 Profiling #

memory_profiler 提供類似 line_profiler 的功能，逐行提供現在的記憶體用量。

安裝：

pip install memory_profiler

寫好 test.py 如下：

import numpy as np

@profile
def pow2(a):
    return a ** 2

@profile
def main():
    a = np.arange(10000000)
    pow2(a)

if __name__ == '__main__':
    main()

在終端機執行

python -m memery_profiler test.py.lprof  # 讀取 test.py.lprof 結果

結果如下：

Filename: test.py

Line #    Mem usage    Increment   Line Contents
================================================
     3   87.336 MiB   87.336 MiB   @profile
     4                             def pow2(a):
     5  125.559 MiB   38.223 MiB       return a ** 2


Filename: test.py

Line #    Mem usage    Increment   Line Contents
================================================
     7   49.102 MiB   49.102 MiB   @profile
     8                             def main():
     9   87.336 MiB   38.234 MiB       a = np.arange(10000000)
    10   87.336 MiB   87.336 MiB       pow2(a)

參考資料：

Timeit 文件

cProfile 文件

Line Profiler Github