You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To create a new zarr store in an object store from a local file, use the `send_to_zarr` command:
35
-
36
-
```bash
37
-
ods send_to_zarr -f /path/to/file.nc -c credentials.json -b bucket_name -v var
38
-
```
39
-
40
-
The arguments used are:
41
-
-`-f`: Path to the netCDF file containing the variables.
42
-
-`-c`: Path to the JSON file containing the object store credentials.
43
-
-`-b`: Bucket name in the object store where the variables will be stored.
44
-
-`-v`: Variable within the netCDF file to send to the object store.
45
-
46
-
In the example above, without a `-p` (or `--prefix`), the variables will be stored in `<bucket_name>/<var>`. If a `--prefix` is provided, the variables will be stored in `<bucket_name>/<prefix>/<var>`.
47
-
48
-
### Sending Lots of Files
49
-
50
-
To create a new zarr store in an object store using a large number of files, we can use [dask](https://www.dask.org) with the `send_to_zarr` command by passing a dask configuration JSON file:
-`-f`: Paths to the multiple netCDF files containing the variables.
61
-
-`-c`: Path to the JSON file containing the object store credentials.
62
-
-`-b`: Bucket name in the object store where the variables will be stored.
63
-
-`-p`: Prefix used to define path to object (see above).
64
-
-`-gf`: Path to model grid file containing domain variables.
65
-
-`-uc`: Coordinates dimension variables to update given as a JSON string '{current_coord : new_coord}'.
66
-
-`-cs`: Chunk strategy used to rechunk model data.
67
-
-`-dc`: Path to JSON file containing Dask configuration.
16
+
## Documentation
68
17
69
-
where the contents of the ``dask_config.json`` are:
70
-
71
-
```json
72
-
{
73
-
"config_kwargs": {
74
-
"temporary_directory":"..../jasmin_os_tmp/",
75
-
"local_directory":"..../jasmin_os_tmp/"
76
-
},
77
-
"cluster_kwargs": {
78
-
"n_workers" : 12,
79
-
"threads_per_worker" : 1,
80
-
"memory_limit":"2GB"
81
-
}
82
-
}
83
-
```
84
-
85
-
In the example, a LocalCluster with 12 single threaded workers, each with 2 GB of available memory, is used to transfer a large collection of files to an object store.
86
-
87
-
Users are strongly recommended to implement `send_to_zarr` workflows using a job scheduler, such as SLURM or PBS, to either run the LocalCluster on a single compute node or to use an existing the SLURMCluster or PBSCluster (dask job queue).
88
-
89
-
**Note:** the netCDF4 library does not support multi-threaded access to datasets, so users should ensure that ``threads_per_worker : 1`` in their dask configuration JSON file to avoid raising CancelledError exceptions when using ``send_to_zarr`` or `update_zarr`.
90
-
91
-
### Updating Existing Stores
92
-
93
-
To update an existing zarr store in an object store, we can use the `update_zarr` command:
This command will replace and/or append the values of variable `var` stored at the local filepath to the `/bucket_name/prefix/var` store provided it already exists in the object store.
100
-
101
-
**Note:** compatability checks must be passed before local data will be appended to an existing store, these include chunk size & dimension compatability.
102
-
103
-
### Updating Existing Stores With Lots of Files
104
-
105
-
To update an existing zarr store in an object store using a large number of files, we can use [dask](https://www.dask.org) via the `update_zarr` command analogously to `send_to_zarr`:
0 commit comments