>>> stage-05/gil.py

القفل الواحد

Global Interpreter Lock — لماذا لا تستفيد CPU-bound threads من multi-core؟

شغّل هذا الكود:

import threading
import time

COUNT = 10**7

def countdown():
    n = COUNT
    while n > 0:
        n -= 1

# thread واحد
start = time.time()
countdown()
print("Single thread:", time.time() - start)

# threadان
def countdown_half():
    for _ in range(COUNT // 2):
        pass

t1 = threading.Thread(target=countdown_half)
t2 = threading.Thread(target=countdown_half)
start = time.time()
t1.start(); t2.start()
t1.join(); t2.join()
print("Two threads:", time.time() - start)

لماذا threadان أبطأ من واحد؟ ما التكلفة الإضافية؟ ثم اشرح: لو استبدلنا time.sleep(1) بدل n -= 1 (IO بدل CPU)، لماذا يصبح threading مفيداً؟

// "ليش" — الدافع التصميمي

في 1992، عندما صمّم Guido بايثون، المعالجات متعددة النواة لم تكن موجودة في الحواسيب الشخصية. الـ GIL كان اختصاراً ذكياً: memory management أبسط (لا locks متعددة للـ refcount)، performance أفضل لـ single thread، وتنفيذ C extensions أسهل. المشكلة: الـ GIL يمنع استخدام أنوية متعددة لنفس العملية.

كيف يعمل GIL عملياً

افتح cpython/Python/ceval.c:

الـ GIL هو mutex (قفل) واحد.
كل thread يريد تنفيذ bytecode must acquire القفل.
بعد فترة (default: 5ms في Python 3.2+ — sys.setswitchinterval)، الـ thread يطلق القفل.
الـ switching يحدث بين bytecode instructions.

آلية _PyEval_CheckInterval: كل CHECK_INTERVAL، thread يتحقق إذا كان هناك thread آخر ينتظر القفل.

لماذا IO threads "تعمل"

عمليات IO هي C extensions تطلق القفل أثناء الانتظار:

Py_BEGIN_ALLOW_THREADS
// ... IO operation (قفل مطلق) ...
Py_END_ALLOW_THREADS

أثناء انتظار IO، threads أخرى تشتغل. هذا يجعل IO-bound workloads تستفيد من threading. لكن CPU-bound workload (حسابات) لا تطلق القفل، فلا فائدة.

"Gilectomy" — محاولات إزالة GIL

عدة محاولات فاشلة: إزالة GIL تطلّب fine-grained locks للـ refcount و dicts و lists — مما جعل single-thread performance أسوأ. الـ C extensions (NumPy, Pandas) تعتمد على GIL لضمان thread safety. القرار: أبقِ GIL وسهّل parallelism طرق أخرى (multiprocessing, subinterpreters).

multiprocessing — الحل الحقيقي

multiprocessing يخلق عمليات منفصلة، لكل process GIL الخاص به. التكلفة: IPC أبطأ من threads، مساحة ذاكرة منفصلة، overhead أعلى للإنشاء.

asyncio — الهروب من GIL تماماً

asyncio ليس parallelism — هو concurrency داخل thread واحد. Event loop يدير tasks، وكل await هو نقطة switch voluntarily. await يعلّق الـ coroutine الحالي ويركّز على آخر من الـ queue. await لا يطلق GIL — لأنه لا يحتاجه — هو switch في userspace.

الممنوع: استخدام sys._current_frames() مباشرة.

المطلوب: اكتب decorator @watchdog(timeout=2) يراقب دالة تُنفَّذ في thread منفصل. إذا استغرقت الدالة أكثر من timeout ثانية دون أن تُطلق GIL (أي بدون أي switch)، اطبع traceback يظهر أين الـ thread عالق حالياً. في الخلفية، الـ watchdog thread يفحص دورياً (كل 0.5 ثانية) الـ sys._current_frames() ليرى أي instruction يُنفَّذ حالياً في thread الهدف. لو ظل نفس الـ instruction لمدة > timeout، اطبع تحذيراً.

// الخلاصة — ماذا ربطنا؟

GIL: قفل واحد، يمنع threads متعددة من تنفيذ bytecode معاً
IO-bound threads تعمل لأن C extensions تطلق القفل
CPU-bound threads لا تستفيد — بل أبطأ بسبب contention overhead
asyncio: concurrency داخل thread واحد (userspace scheduling)
multiprocessing: الحل الحقيقي لـ CPU-bound tasks

الآن أنت تعرف أن threading لا يعطي parallelism حقيقياً لـ CPU-bound code. لكن هناك حالات داخل بايثون نفسها لا تحتاج GIL — مثلاً الـ memory allocation في C. فهل يمكن تشغيل كود pure Python بدون GIL؟ PEP 703 (nogil) تحاول. لكن خارج هذا، حتى الـ operations البسيطة مثل obj.attr تحتاج GIL لأنها تقرأ ob_type وتتبّع MRO — وكل هذا ليس thread-safe بدون قفل. هذا يقودنا للسؤال: كيف تجعل بايثون الـ attribute access thread-safe بـ GIL؟ الإجابة: البروتوكولات — الحقل tp_getattro يُستدعى عبر GIL lock.