Skip to content

Backend agnostic API via value_and_*!! and caching.#148

Closed
AstitvaAggarwal wants to merge 4 commits intoSciML:mainfrom
AstitvaAggarwal:adtypes-gradient-api
Closed

Backend agnostic API via value_and_*!! and caching.#148
AstitvaAggarwal wants to merge 4 commits intoSciML:mainfrom
AstitvaAggarwal:adtypes-gradient-api

Conversation

@AstitvaAggarwal
Copy link
Copy Markdown
Member

@AstitvaAggarwal AstitvaAggarwal commented Apr 26, 2026

Summary

This PR adds a minimal gradient API to ADTypes. The goal is to provide a shared interface that AD backends (Mooncake, etc.) can natively implement.

Expected features to add :

Capability trait (GradientOrder{K}): backends declare what order of derivatives they support. Inspired by LogDensityProblems.jl's LogDensityOrder{K} pattern.

ADTypes.gradient_order(::AutoMooncake) = GradientOrder{1}()

Interface functions (declared @public, not exported and avoids namespace conflict with DifferentiationInterface):                                                                       
 
ADTypes.value_and_gradient!!(f, backend, x)  # -> (y, g)                                                                                                                              
ADTypes.value_and_jacobian!!(f, backend, x)  # -> (y, J)                                                                                                                              

The !! signals that backends may mutate internal state; callers own the returned values and must copy if retaining across calls.

Error fallbacks: unimplemented backends throw an ArgumentError telling the backend author exactly what to implement.

Design

  • Backend implementations live in the backend's own repo as a package extension against ADTypes (not in ADTypes itself)
  • This PR is interface-only and no backend implementations included.
  • Mooncake's implementation will follow in a separate PR to the Mooncake repo.

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

@gdalle
Copy link
Copy Markdown
Collaborator

gdalle commented Apr 26, 2026

I do not agree with this PR. Its title alone is enough to see that this functionality is already taken care of by DI. If the only issue justifying this is the time it takes me to solve a compat issue that Mooncake itself caused, then we need a better way to work together here.

The PR in question (which I reviewed 10 days ago):

@ChrisRackauckas
Copy link
Copy Markdown
Member

It could in theory be good for AD libraries to take control of the basic jvp/vjp implementation in a standard way, since that is essentially what the AD libraries provide. So AD libraries, instead of naming everything in their own way, provide just a standard jvp/vjp call would be nice, then DI would extend that to a whole set of other differentiation calls built on top. But the problem is that it would need buy-in from the AD libraries as well, because then the non-pirating implementation would be extensions in each AD library. I think it would be good for Enzyme, ForwardDiff, ReverseDiff, etc. to all have a standard jvp(...) call that is just provided as their primitive, but (a) details can get messy and (b) some of these libraries may not adopt it just do to conservative practices around effectively archived projects. In that case we could just delegate DI as the defacto implementation of them (and document that only DI can define these functions), but then we may end up in the situation where DI is the defacto implementation of all but one backend, in which case, why not just let this live in DI?

@AstitvaAggarwal AstitvaAggarwal changed the title AD backend agnostic API via value_and_*!! and caching. Backend agnostic API via value_and_*!! and caching. Apr 26, 2026
@Technici4n
Copy link
Copy Markdown

I disagree with this PR as well. There is substantial overlap with DI. It would be preferable to invest energy into fixing the DI / Mooncake bindings rather than trying to sidestep DI like this.

@vchuravy
Copy link
Copy Markdown

to provide a shared interface that AD backends (Mooncake, Enzyme, etc.) can natively implement.

There is currently no interest from Enzyme-side to implement yet another interface

@yebai
Copy link
Copy Markdown

yebai commented Apr 26, 2026

It would be useful to upstream a vjp and jvp interface specification that both DI and autograd backends can implement.

This does not necessarily overlap with DI; rather, it enables autograd libraries to adopt the interface on an optional basis, while DI can gradually transition to native vjp and jvp implementations as they become available.

Mooncake is willing to implement this interface, which should help avoid future DI–Mooncake compatibility issues.

@adrhill
Copy link
Copy Markdown
Contributor

adrhill commented Apr 27, 2026

The value of a backend-agnostic interface for AD comes from a specification of standardized input-output-behavior that downstream users can rely upon. There is no value in having a ADTypes.value_and_gradient!! that returns different outputs depending on the backend.

This PR would have to define such a specification and either:

  • a) convince the maintainers of a dozen AD backends to adopt it
  • b) write code that standardizes inputs and outputs such that the behavior of backends matches

The specification then needs to be enshrined in a test suite, such that downstream users don't get their code broken when a backend makes sudden breaking changes without communicating them.

Once all of these steps have been implemented, you will end up with something very similar to DI. As it stands, this PR feels like a way for Mooncake to circumvent DI's existing interface specification tests to me. No software is perfect, including DI, but @gdalle has shown a large amount of patience and willingness to address issues. Hopefully we can all work together to find a solution without having to reinvent the wheel.

@gdalle
Copy link
Copy Markdown
Collaborator

gdalle commented Apr 27, 2026

I think it would be good for Enzyme, ForwardDiff, ReverseDiff, etc. to all have a standard jvp(...) call that is just provided as their primitive, but (a) details can get messy

Details do get messy. Specifying a standard JVP or VJP in the absence of mutation is already not easy, and it took all of ChainRulesCore to make it happen. In the presence of mutation, bringing together the conventions of every single backend is... pretty much what DI tries to do. See e.g. JuliaDiff/DifferentiationInterface.jl#681.

It would be useful to upstream a vjp and jvp interface specification that both DI and autograd backends can implement.

If the function value_and_gradient! lives in ADTypes and the backend object lives in ADTypes, then there is nothing DI can do with them without committing type piracy. The end effect of the current PR would be an ecosystem where the following functions all float around, with separate implementations:

ADTypes.value_and_gradient!!
DI.value_and_gradient!
Backend.value_and_gradient

@ChrisRackauckas
Copy link
Copy Markdown
Member

I think the general consensus here is that this is unlikely to be the right place for this API, as ADTypes defines the types for describing AD systems to use but implementation of actual functions is left downstream. So I'll close, but I think this was a worthwhile discussion to have to clarify the boundary of this package. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants