|
22 | 22 | # - Tobias Wegner, tobias.wegner@cern.ch, 2017-2018 |
23 | 23 | # - Alexey Anisenkov, anisyonk@cern.ch, 2018-2024 |
24 | 24 |
|
25 | | -"""API for data transfers.""" |
| 25 | +""" |
| 26 | +API for data transfers. |
| 27 | +
|
| 28 | +This module provides a high-level API for managing data transfers (stage-in and stage-out) |
| 29 | +within the Pilot framework. It serves as an abstraction layer over various underlying transfer |
| 30 | +protocols and tools, collectively known as "copytools." The primary goal is to provide a |
| 31 | +unified interface for staging data, regardless of the specific technology used for the transfer. |
| 32 | +
|
| 33 | +Core Classes: |
| 34 | +- `StagingClient`: This is the base class that provides the common framework for data staging. |
| 35 | + It handles the dynamic selection of copytools based on site configuration and the type of |
| 36 | + activity (e.g., 'read_lan', 'write_wan'). It includes methods for resolving file replicas |
| 37 | + from catalogs like Rucio, sorting them based on priority (e.g., LAN vs. WAN), and |
| 38 | + orchestrating the transfer process through the `transfer` method. It also manages tracing |
| 39 | + and logging for transfers. |
| 40 | +
|
| 41 | +- `StageInClient`: This class inherits from `StagingClient` and specializes in handling the |
| 42 | + stage-in of input files. It contains logic to resolve the best replica for input files, |
| 43 | + considering factors like direct access modes (LAN/WAN), allowed schemas (e.g., 'root', 'https'), |
| 44 | + and site-specific storage configurations. It is also responsible for checking available |
| 45 | + disk space and verifying that input file sizes are within configured limits. |
| 46 | +
|
| 47 | +- `StageOutClient`: This class, also inheriting from `StagingClient`, is responsible for |
| 48 | + staging out output files. Its key functionality includes preparing destinations by resolving |
| 49 | + the correct output storage element (RSE) based on the activity. It constructs the final |
| 50 | + destination SURL (Storage URL) for the output files, calculates checksums for verification, |
| 51 | + and ensures that output files exist and have a non-zero size before initiating the transfer. |
| 52 | +
|
| 53 | +Key Concepts: |
| 54 | +- Copytools: The actual file transfers are delegated to specific "copytool" modules, which |
| 55 | + are located in the `pilot/copytool/` directory (e.g., `rucio`, `xrdcp`, `gfal`). The |
| 56 | + `StagingClient` dynamically imports and uses the appropriate copytool based on the |
| 57 | + `acopytools` configuration for a given activity. This design makes the system extensible |
| 58 | + to new transfer protocols. |
| 59 | +
|
| 60 | +- Replica Resolution: For stage-in, the client queries a catalog (like Rucio) to find all |
| 61 | + available replicas (copies) of a file. These replicas are then sorted by priority |
| 62 | + (e.g., network proximity, site preference) to select the most efficient source for the |
| 63 | + transfer. |
| 64 | +
|
| 65 | +- Protocol and Destination Resolution: For stage-out, the client determines the correct |
| 66 | + destination storage and the protocol to use for the transfer based on site and experiment |
| 67 | + configurations stored in `ddmconf` and `astorages`. |
| 68 | +
|
| 69 | +- Direct Access: The `StageInClient` supports a "direct access" or "remote I/O" mode, where |
| 70 | + files are not physically copied to the worker node but are accessed remotely by the payload. |
| 71 | + The client identifies when this mode is applicable and sets the file status accordingly, |
| 72 | + providing the payload with the correct remote TURL (Transport URL). |
| 73 | +
|
| 74 | +Workflow: |
| 75 | +1. A `StageInClient` or `StageOutClient` is instantiated with site-specific information. |
| 76 | +2. The `transfer` method is called with a list of `FileSpec` objects that represent the |
| 77 | + files to be transferred. |
| 78 | +3. The client determines the appropriate copytool(s) for the given activity. |
| 79 | +4. For each copytool, the client prepares the files: |
| 80 | + - Stage-in: Resolves replicas, selects the best source URL, and checks for direct |
| 81 | + access possibilities. |
| 82 | + - Stage-out: Resolves the destination URL and prepares the source file by checking for |
| 83 | + its existence and calculating its checksum. |
| 84 | +5. The client calls the `copy_in` or `copy_out` function of the selected copytool module, |
| 85 | + passing the list of files to be transferred. |
| 86 | +6. The copytool executes the transfer. |
| 87 | +7. The client updates the status of the `FileSpec` objects and handles any errors. If one |
| 88 | + copytool fails, it can try the next one in the configured list. |
| 89 | +""" |
26 | 90 |
|
27 | 91 | import os |
28 | 92 | import hashlib |
|
0 commit comments