Change the repository type filter
All
Repositories list
11 repositories
GAGE
PublicGeneral AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.SCMAPR
PublicFinMTM
Public[ACL 2026] FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent EvaluationBizFinBench.v2
PublicBizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMsCCPO
PublicCompress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI AgentsBizFinBench
PublicPuzzleClone
Public[ACL 2026] PuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning DataMME-Finance
Public[MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and ReasoningNEXUS-O
Public[MM 2025] NEXUS-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And VisionPolyhedronEvaluator
PublicPublished_Papers
Public
ProTip! When viewing an organization's repositories, you can use the
props. filter to filter by custom property.