Backend agnostic API via value_and_*!! and caching.#148
Backend agnostic API via value_and_*!! and caching.#148AstitvaAggarwal wants to merge 4 commits intoSciML:mainfrom
value_and_*!! and caching.#148Conversation
…value_and_jacobian!!
f217f08 to
7a0ae41
Compare
|
I do not agree with this PR. Its title alone is enough to see that this functionality is already taken care of by DI. If the only issue justifying this is the time it takes me to solve a compat issue that Mooncake itself caused, then we need a better way to work together here. The PR in question (which I reviewed 10 days ago): |
|
It could in theory be good for AD libraries to take control of the basic jvp/vjp implementation in a standard way, since that is essentially what the AD libraries provide. So AD libraries, instead of naming everything in their own way, provide just a standard jvp/vjp call would be nice, then DI would extend that to a whole set of other differentiation calls built on top. But the problem is that it would need buy-in from the AD libraries as well, because then the non-pirating implementation would be extensions in each AD library. I think it would be good for Enzyme, ForwardDiff, ReverseDiff, etc. to all have a standard |
value_and_*!! and caching.value_and_*!! and caching.
|
I disagree with this PR as well. There is substantial overlap with DI. It would be preferable to invest energy into fixing the DI / Mooncake bindings rather than trying to sidestep DI like this. |
There is currently no interest from Enzyme-side to implement yet another interface |
|
It would be useful to upstream a vjp and jvp interface specification that both DI and autograd backends can implement. This does not necessarily overlap with DI; rather, it enables autograd libraries to adopt the interface on an optional basis, while DI can gradually transition to native vjp and jvp implementations as they become available. Mooncake is willing to implement this interface, which should help avoid future DI–Mooncake compatibility issues. |
|
The value of a backend-agnostic interface for AD comes from a specification of standardized input-output-behavior that downstream users can rely upon. There is no value in having a This PR would have to define such a specification and either:
The specification then needs to be enshrined in a test suite, such that downstream users don't get their code broken when a backend makes sudden breaking changes without communicating them. Once all of these steps have been implemented, you will end up with something very similar to DI. As it stands, this PR feels like a way for Mooncake to circumvent DI's existing interface specification tests to me. No software is perfect, including DI, but @gdalle has shown a large amount of patience and willingness to address issues. Hopefully we can all work together to find a solution without having to reinvent the wheel. |
Details do get messy. Specifying a standard JVP or VJP in the absence of mutation is already not easy, and it took all of ChainRulesCore to make it happen. In the presence of mutation, bringing together the conventions of every single backend is... pretty much what DI tries to do. See e.g. JuliaDiff/DifferentiationInterface.jl#681.
If the function ADTypes.value_and_gradient!!
DI.value_and_gradient!
Backend.value_and_gradient |
|
I think the general consensus here is that this is unlikely to be the right place for this API, as ADTypes defines the types for describing AD systems to use but implementation of actual functions is left downstream. So I'll close, but I think this was a worthwhile discussion to have to clarify the boundary of this package. Thanks everyone! |
Summary
This PR adds a minimal gradient API to ADTypes. The goal is to provide a shared interface that AD backends (Mooncake, etc.) can natively implement.
Expected features to add :
Capability trait (
GradientOrder{K}): backends declare what order of derivatives they support. Inspired byLogDensityProblems.jl'sLogDensityOrder{K}pattern.The
!!signals that backends may mutate internal state; callers own the returned values and must copy if retaining across calls.Error fallbacks: unimplemented backends throw an ArgumentError telling the backend author exactly what to implement.
Design
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.