TODO_DEFERRED: add D11 — tier 1 REPL tests for unpythonic.net

Technologicat · claude · Technologicat · commit cfa6b3223394 · 2026-04-15T15:15:55.000+03:00
Parks the tier 1 scripted_repl implementation for
unpythonic.net.client/server as a deferred task for a focused
future session.  The helper pattern and reference implementation
live at mcpyrate/test/test_126_repl.py (mcpyrate commit 0fee81b)
from the same 2026-04-15 session that established the approach.

Entry covers:
- the core in-process approach (server in daemon thread on a
  port-0 binding, client driven by scripted_repl in the same
  process, both ends speaking TCP to 127.0.0.1),
- the one-level nit on scripted_repl (state changes inside try,
  finally handles restoration + StringIO.getvalue() materialization
  so the helper is atomic from the caller's perspective),
- where to put the tests (unpythonic/net/tests/test_client.py,
  new file; runtests.py auto-discovers),
- the unpythonic-specific plumbing that deserves design attention
  (server-in-thread helper, wait-for-bind semantics, port-0 dance,
  clean shutdown, stderr leakage),
- a starting list of 5–6 test cases including a protocol-level
  roundtrip that bypasses the interactive loop entirely,
- cross-references to D9 (Windows port), D10 (tier 2 for net),
  and the sibling raven entry for minichat tier 1.

Self-contained enough for a fresh CC session to pick up cold.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/TODO_DEFERRED.md b/TODO_DEFERRED.md
@@ -1,6 +1,6 @@
 # Deferred Issues
 
-Next unused item code: D11
+Next unused item code: D12
 
 - **D5**: `dispatch.py` — moved to GitHub issue #99. Dispatch-layer improvements for parametric ABCs (warn/error on indistinguishable multimethods). Typecheck-layer part resolved.
 
@@ -62,3 +62,47 @@ Next unused item code: D11
   ```
 
   **When to actually do it**: only if tier 1 coverage turns out to miss something important (a regression hits prod that tier 1 would not have caught). The in-thread server + scripted client approach already exercises most of the protocol surface; tier 2 is primarily a safety net for terminal-semantics and signal-path bugs. Until one of those bites, tier 1 is the main win. (Added 2026-04-15, alongside the tier 1 bring-up.)
+
+
+- **D11: Implement tier 1 REPL tests for `unpythonic.net.client` / `unpythonic.net.server`**: Currently `unpythonic/net/tests/` has **no test files**. The design for how to bring `unpythonic.net` under test was worked out on 2026-04-15 in a session that also implemented the canonical tier-1 example in mcpyrate — that example lives at `mcpyrate/test/test_126_repl.py` (committed as `0fee81b`) and is the reference to crib from when picking this up.
+
+  **Core approach**: in-process, single-test-process. No subprocess boundary. The server runs in a daemon thread on a throwaway `127.0.0.1` port; the client runs in the same process with its interactive loop driven by the `scripted_repl` context manager (monkey-patches `builtins.input`, captures stdout/stderr via `io.StringIO`). Both ends speak TCP to `127.0.0.1`, which keeps everything local and debuggable. The `scripted_repl` helper pattern to copy verbatim:
+
+  - State changes (input swap, stdout swap, stderr swap) go **inside** the `try` block so a mid-setup failure still triggers the `finally` restoration — atomic from the caller's perspective.
+  - `StringIO → str` materialization happens inside `finally`, so captured values are consistent between success and failure paths.
+  - Scripted input ends by raising `EOFError` when the script is exhausted — that's how `code.InteractiveConsole.interact()` exits cleanly.
+
+  **Where to put the tests**: new file `unpythonic/net/tests/test_client.py` (runtests.py auto-discovers `test_*.py` under each package). Use the unpythonic test-framework style: `runtests()` function, `with testset("..."):` blocks, `test[...]` assertion macros, `the[...]` value capture. See the CLAUDE.md "unpythonic.test.fixtures framework" subsection for the semantic Pass/Fail/Error/Warn distinction if uncertain about which is which.
+
+  **Plumbing you'll need**:
+
+  1. **Server-in-thread helper.** Something like:
+     ```python
+     def start_test_server() -> tuple[threading.Thread, int]:
+         """Start a daemon server on 127.0.0.1:<random-free-port>.  Returns (thread, port)."""
+         ...  # bind to port 0, retrieve the assigned port, hand off to run_server in a thread
+     ```
+     The tricky bit is "wait for server ready" before the client connects. Options: a `threading.Event` that the server sets after it binds; or `socket.create_connection` in a retry loop with a short backoff on the client side. Either works; the first is cleaner if `unpythonic.net.server.run_server` can accept a ready-event parameter, the second avoids touching server code.
+  2. **Decide: is there a working `run_server(port=0, ...)` entry point today?** If not, you may need a small refactor in `server.py` to expose one. Check first — the existing code may already accept a port parameter.
+  3. **Cleanup**. Daemon threads don't block interpreter shutdown, but a leaked socket on the server side can prevent a quick re-run. Implement a `stop_test_server()` helper that closes the listen socket cleanly, or use a `contextmanager`-style `with test_server() as (thread, port):` so the teardown is guaranteed.
+  4. **Client-side**: `unpythonic.net.client.run_client(host="127.0.0.1", repl_port=port, control_port=...)` (check actual signature) driven inside a `scripted_repl` block. If the client has a module-level `import readline` that needs to be moved inside the client function to avoid import-time ImportError on Windows, do that refactor as a prerequisite — but for the initial Linux-only tier 1 it's not strictly necessary (and it's covered in more depth by the D9 Windows port).
+
+  **Tests to start with** (5–6 is a good starting coverage, mirroring mcpyrate's test_126_repl structure):
+
+  - `test_basic_roundtrip` — connect, submit `"2 + 3"`, expect `"5"` in client stdout.
+  - `test_multiline_input` — `def f():` / `    return 42` / blank line / `f()` → `42` appears.
+  - `test_syntax_error_recovery` — bad input produces a SyntaxError in the client output, then the next good input still evaluates. The remote eval runs on the server; the server should catch its own SyntaxError and respond, not crash.
+  - `test_clean_disconnect` — empty script → EOFError → client disconnects → server continues running (verify by doing a second connect after).
+  - `test_protocol_level_roundtrip` (bypassing the interactive loop) — connect directly to the TCP socket, send a framed message per the protocol in `unpythonic.net.msg`, verify the response. This covers the server/client boundary without going through `input()` at all and is the best place to catch regressions in the message protocol itself.
+  - *Stretch*: `test_two_clients_concurrent` — if the server supports multiple simultaneous REPL sessions, connect two clients in separate threads and verify they don't interfere.
+
+  **Watch out for**:
+
+  - **Port collisions**: always bind to port 0 and ask the kernel for the actual port via `sock.getsockname()[1]`. Never hardcode a port in the test.
+  - **Shutdown latency**: if a test leaves a bound socket behind, the next test run on the same port may fail. The daemon-thread approach helps but explicit cleanup via `SO_REUSEADDR` or an atexit hook is more robust.
+  - **Stderr leakage**: the server may log to real stderr during the test. Either redirect via `sys.stderr = ...` inside the test (the `scripted_repl` helper already does this for the client), or arrange for the server to use its own logger that the test configures.
+  - **macOS `parse_and_bind` branch** in `net/client.py` already lands via this same 2026-04-15 session (see CHANGELOG under 2.0.1 Fixed); tests should exercise this branch on macOS CI once they exist.
+
+  **Why deferred**: the helper pattern is straightforward but the server-in-thread plumbing plus shutdown semantics deserves focused design attention, not a squeeze at the end of an already-long session. This entry is self-contained enough that a fresh CC session can pick it up cold.
+
+  (Added 2026-04-15 at the same natural stopping point where D9 and D10 were added. Related: D10 is the tier 2 counterpart — subprocess + pty, deferred until we know tier 1 isn't enough.)