Skip to content
View aayambansal's full-sized avatar
🏠
In SF
🏠
In SF

Block or report aayambansal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aayambansal/README.md

Pinned Loading

  1. ConsistencyBench ConsistencyBench Public

    We benchmark 18 frontier LLMs on cross-query logical consistency, reveal universal 36-57pp gaps between individual accuracy and set-level consistency, and propose a training-free method (CGD) that …

    Python

  2. Sketch2Feedback Sketch2Feedback Public

    Sketch2Feedback evaluates and improves rubric-aligned formative feedback on student-drawn STEM diagrams (e.g., free-body diagrams and simple circuits) using a lightweight "grammar-in-the-loop" pipe…

    Python

  3. ExecutableClaims ExecutableClaims Public

    Writing-time claim verification: extracts scientific claims from manuscripts, grounds them against cited evidence, and compiles executable test capsules. (AAAI 2026)

    Python

  4. VIAR VIAR Public

    We discover and characterize the visual neglect zone, a systematic pattern in vision-language models (VLMs) where middle transformer layers allocate disproportionately low attention to visual token…

    Python

  5. OpenDiscoveryTrace OpenDiscoveryTrace Public

    Process traces for evaluating AI scientist workflows | ICML 2026 AI4Science Dataset Competition | 432 trajectories from GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro

    TeX