Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT #361
Addressing Incorrect Handling of Python GIL that Leads to a SEGFAULT #361Scusemua wants to merge 2 commits into
Conversation
|
Hi @Scusemua ! Thanks for your patch! You saved me several hours of life! :)) PyGILState locking is needed for sure, but Go is insidious too :) even between PyGILState_Ensure/PyGILState_Release calls! And sure, PyGILState_Release will panic that current thread has no python thread info and SEGFAULT etc. My simple two clients-loaded Web server was crashing on this even with your patch. To fix - we need to lock current goroutine to OS thread during the PyGILState_Ensure/PyGILState_Release section. Simply add runtime.LockOSThread() call before PyGILState_Ensure (1 place in bind/symbols.go) and runtime.UnlockOSThread() after every PyGILState_Release call (3 places in bind/symbols.go) And now my test server is running fine, calling Python callback in the web handler :) Feel free to add this to your branch and be brave to propose your PR for merge! Have nice days guys! |
|
Hi @nickeygit, Yeah, I also found that I needed to pin the goroutines to OS threads, or else I would get really strange errors -- just like you're describing. I just think I forgot to add an update about this to my original post. 😅 With these two fixes, everything has been working fine on my end with no more issues for months! :) |
|
I've encountered a new issue in which the application encounters a segmentation fault when calling the Sometimes, the segmentation fault occurs on this line: Other times, it doesn't occur until The issue is with the At first, I thought it was perhaps because the callback was being executed before the original thread state was restored by But I tried adding a Has anybody else encountered this? Have you ever experienced this issue, @nickeygit? |
|
@Scusemua I'm sorry I didn't see your work sooner or reach out 😞 I've been getting deeper into gopy and only recently realized how much your changes helped. I was digging deep into issue 370 and 385 on macos, and after a bunch of failed starts, I realized that your changes made investigation and progress so much more approachable. I cherry-picked your changes and included them in #393. If you have any thoughts on PR #393, or any questions about it, I'd love to hear them — and I'll do my best to answer. If you have time, and if you have any input or feedback, I'd be incredibly grateful. Sincerely, thank you for the contributions! Feel free to close this PR if you're happy with how the work landed in #393. |
I found what appears to be a bug in
bind/symbols.go.Consider the following Go function that was generated by gopy:
This is generated for a Go method
ExampleFunctionof aGoObjectstruct. The intent is for this method to be called from Python.The bug is that we access the C/Python API function
C.PyCallable_Checkbefore acquiring the GIL, leading to aSEGFAULT. The problem lies in the following excerpt of the above function:Notice that we only acquire the GIL via
PyGILState_Ensureafter the call toC.PyCallable_Check(_fun_arg), which leads to aSEGFAULT.The changes in this PR will result in the above Go function being generated differently as:
Notice that we now call
C.PyGILState_Ensure()before the call toC.PyCallable_Check(_fun_arg). Likewise, we callC.PyGILState_Release(_gstate)before returning within the body of the associated if-statement to ensure that there is a matching call (to match the_gstate := C.PyGILState_Ensure()).In terms of testing, I've just been using the modified
gopyin an application I'm developing. I'm in the process of doing more testing in the meantime; however, I am confident that this is a bug, as you must hold the Python GIL when interfacing/invoking any of the C/Python API.From the "Thread State and the Global Interpreter Lock" subsection of the "Initialization, Finalization, and Threads" Python C-API documentation:
On a separate (but related) note -- since we defer
C.PyEval_RestoreThread(_saved_thread)at the beginning of the method's execution, it's a little silly to release the GIL viaC.PyGILState_Release(_gstate)immediately before returning (even in the case where we don't call into Go code); however, we need to have matching calls betweenC.PyEval_SaveThread()andC.PyEval_RestoreThread()as well asPyGILState_Ensure()andPyGILState_Release(), so I think it ultimately makes sense to do things this way for now. In theory, we could provide more complex logic to handle the GIL more efficiently in the future.