Skip to content

Commit 49ba426

Browse files
committed
rq: add a garbage collector to the worker
Implement a maintenance hook on the standard redis queue worker to do garbage collection on expired builds. When a result expires from the queue, its data will be removed from the public/store/ directory at the regular maintenance interval (default is every 600 seconds). Signed-off-by: Eric Fahlgren <ericfahlgren@gmail.com>
1 parent 095c41c commit 49ba426

4 files changed

Lines changed: 76 additions & 2 deletions

File tree

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
- name: Set __version__ and poetry version
3030
run: |
3131
TAG="$(git describe --tags --always | awk -F"-" '{if (NF>1) {print substr($1, 2)".post"$2} else {print substr($1, 2)}}')"
32-
echo "__version__ = \"$TAG\"" > asu/__init__.py
32+
sed "s/__version__.*/__version__ = \"$TAG\"/" -i asu/__init__.py
3333
poetry version "$TAG"
3434
3535
- name: Build and publish PyPi package

asu/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
11
__version__ = "0.0.0"
2+
3+
from .rq import GCWorker as GCWorker

asu/rq.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
from re import compile
2+
from pathlib import Path
3+
from rq import Queue, Worker
4+
from rq.job import Job
5+
from podman import PodmanClient
6+
from shutil import rmtree
7+
8+
from asu.config import settings
9+
from asu.util import log, get_podman
10+
11+
REQUEST_HASH_LENGTH = 64
12+
store: Path = settings.public_path / "store"
13+
podman: PodmanClient = get_podman()
14+
15+
16+
class GCWorker(Worker):
17+
"""A Worker class that does periodic garbage collection on ASU's
18+
public store directory. We tie into the standard `Worker` maintenance
19+
sequence, so the period is controlled by the base class. You may change
20+
the garbage collection frequency in podman-compose.yml by adding a
21+
`--maintenance-interval` option to the startup command as follows (the
22+
default is 600 seconds).
23+
24+
>>> command: rqworker ... --maintenance-interval 1800
25+
"""
26+
27+
hash_match = compile(f"^[0-9a-f]{{{REQUEST_HASH_LENGTH}}}$")
28+
29+
def clean_store(self) -> None:
30+
"""For performance testing, the store directory was mounted on a
31+
slow external USB hard drive. A typical timing result showed ~1000
32+
directories deleted per second on that test system. The synthetic
33+
test directories were created containing 10 files in each.
34+
File count dominated the timing, with file size being relatively
35+
insignificant, likely due to `stat` calls being the bottleneck.
36+
(Just for comparison, tests against store mounted on a fast SSD
37+
were about twice as fast.)
38+
39+
>>> Cleaning /mnt/slow/public/store: deleted 5000/5000 builds
40+
>>> Timing analysis for clean_store: 5.081s
41+
"""
42+
43+
deleted: int = 0
44+
total: int = 0
45+
dir: Path
46+
queue: Queue
47+
for dir in store.glob("*"):
48+
if not dir.is_dir() or not self.hash_match.match(dir.name):
49+
continue
50+
total += 1
51+
for queue in self.queues:
52+
job: Job = queue.fetch_job(dir.name)
53+
log.info(f" Found {dir.name = } {job = }")
54+
if job is None:
55+
rmtree(dir)
56+
deleted += 1
57+
58+
log.info(f"Cleaning {store}: deleted {deleted}/{total} builds")
59+
60+
def clean_podman(self) -> None:
61+
"""Reclaim space from the various podman disk entities as they are orphaned."""
62+
removed = podman.containers.prune()
63+
log.info(f"Reclaimed {removed.get('SpaceReclaimed', 0):,d}B from containers")
64+
removed = podman.images.prune()
65+
log.info(f"Reclaimed {removed.get('SpaceReclaimed', 0):,d}B from images")
66+
removed = podman.volumes.prune()
67+
log.info(f"Reclaimed {removed.get('SpaceReclaimed', 0):,d}B from volumes")
68+
69+
def run_maintenance_tasks(self):
70+
super().run_maintenance_tasks()
71+
self.clean_store()
72+
self.clean_podman()

podman-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ services:
2222
context: .
2323
dockerfile: Containerfile
2424
restart: unless-stopped
25-
command: rqworker --logging_level INFO
25+
command: rqworker --logging_level INFO --with-scheduler --worker-class asu.GCWorker
2626
env_file: .env
2727
environment:
2828
REDIS_URL: "redis://redis:6379/0"

0 commit comments

Comments
 (0)