You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/platform/src/content/docs/docs/config/production-config.mdx
+29-11Lines changed: 29 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ At the top level of the CAS Config, we've stores and servers. Each server define
26
26
27
27
Specifically, under servers, we've two separate servers defined:
28
28
29
-
```json
29
+
```json5
30
30
"servers": [{
31
31
"listener": {
32
32
"http": {
@@ -52,7 +52,7 @@ Specifically, under servers, we've two separate servers defined:
52
52
```
53
53
54
54
Let’s focus on the main server that exposes the CAS and ActionCache services.
55
-
```json
55
+
```json5
56
56
{
57
57
"listener": {
58
58
"http": {
@@ -82,7 +82,7 @@ From this definition, we see that an HTTP listener binds to port 50051 on all ne
82
82
This server hosts four services: CAS, ac, capabilities, and bytestream. The capabilities service is needed for supporting the Bazel protocol. The bytestream service is used to stream data to and from the CAS and is recommended for handling large objects.
83
83
84
84
You might be wondering what the “main” object under "CAS" and “AC” services means. In this case, it indicates the instance name, which means you need to pass --remote_instance_name=main. Alternatively, you can use the following Configuration so your Bazel clients don’t have to pass the --remote_instance_name parameter:
85
-
```json
85
+
```json5
86
86
"cas": [{
87
87
"cas_store":"cas_STORE"
88
88
}],
@@ -128,7 +128,7 @@ Completeness checking store verifies if the output files & folders exist in the
128
128
129
129
Effectively, this store ensures the CAS and ActionCache are in a consistent state for a given Action digest (key). If not, then the requested Action digest is treated as a cache miss and needs to be re-computed. As mentioned above, the Remote execution proto gives hints about the behavior of the ActionCache, such as this comment for the GetActionResult endpoint:
130
130
131
-
```json
131
+
```json5
132
132
// Implementations SHOULD ensure that any blobs referenced from the
@@ -160,7 +160,7 @@ The slow side of the Action Cache `fast_slow` in our cloud platform uses the Red
160
160
Notice that we pull the actual address of Redis from the REDIS_STORE_URL environment variable, which helps keep the Config structure free of environment specific settings.
161
161
162
162
The fast side of the Action Cache `fast_slow` store is a `size_partitioning` store:
163
-
```json
163
+
```json5
164
164
"size_partitioning":{
165
165
"size":1000,
166
166
"lower_store": {
@@ -187,7 +187,7 @@ That covers the stores for the ActionCache, now let’s look at the CAS service
187
187
CAS
188
188
189
189
The NativeLink CAS service stores content using a cryptographic hash of the content itself as the cache key, known as Content Addressable Storage. From a distributed build system perspective, it makes sense to use a CAS since we can avoid rebuilding outputs during the build process because the CAS guarantees stored content hasn't changed for any given hash key. However, we’re not here to learn how Bazel remote caching works with CAS, as there are plenty of resources about that on the Web, so let’s turn our attention to how the NativeLink CAS store works. In the Config JSON, we define the top-level cas_STORE:
190
-
```json
190
+
```json5
191
191
"cas_STORE": {
192
192
"existence_cache": {
193
193
"backend": {
@@ -219,7 +219,7 @@ Intuitively, this store is an optimization that helps speed up requests for the
219
219
Here we’re using a verify store which verifies the size of the data being uploaded into the CAS. This store helps ensure the integrity of your CAS. In this case, we chose to not have a store named cas_VERIFY_STORE that references the cas_FAST_SLOW_STORE but that would be an acceptable Configuration if you wanted to avoid nesting stores within stores in your Configuration.
220
220
221
221
The back-end for the verify store is a `fast_slow` store. Let’s look at the slow store first.
222
-
```json
222
+
```json5
223
223
"slow": {
224
224
"size_partitioning":{
225
225
"size":1500000,
@@ -258,7 +258,7 @@ To recap, for our CAS slow store, we send smaller objects to Redis and larger to
258
258
259
259
On the fast side, we use a similar approach we did for ActionCache using `size_partitioning` scheme with a memory store.
260
260
261
-
```json
261
+
```json5
262
262
"fast": {
263
263
"size_partitioning":{
264
264
"size":64000,
@@ -283,7 +283,7 @@ CAS Config JSON
283
283
Here is the final CAS Config JSON without the 99 extra shards for writing to S3.
284
284
285
285
## Production CAS JSON
286
-
```json
286
+
```json5
287
287
{
288
288
"stores": {
289
289
"AC_FAST_SLOW_STORE": {
@@ -454,6 +454,24 @@ Here is the final CAS Config JSON without the 99 extra shards for writing to S3.
454
454
}
455
455
```
456
456
457
+
## Limit Worker Inflight Tasks
458
+
459
+
If your workers are getting saturated, cap the number of concurrent tasks they
460
+
will accept with `max_inflight_tasks`. This helps avoid runaway scheduling when
461
+
actions spike or when a single worker falls behind.
462
+
463
+
```json5
464
+
workers: [{
465
+
local: {
466
+
worker_api_endpoint: {
467
+
uri:"grpc://127.0.0.1:50061",
468
+
},
469
+
// Set to 0 for unlimited.
470
+
max_inflight_tasks:32,
471
+
}
472
+
}]
473
+
```
474
+
457
475
458
476
## Speed Up NativeLink by Turning Off a Hidden Redis Query
459
477
@@ -469,11 +487,11 @@ Every time this runs, it fires off a wildcard query to Redis. These queries aren
0 commit comments