Skip to content

Commit c47c635

Browse files
committed
Update README.md to include proxy rotator middleware
1 parent 27108e0 commit c47c635

1 file changed

Lines changed: 87 additions & 0 deletions

File tree

src/ps_helper/middlewares/README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,90 @@ DOWNLOADER_CLIENTCONTEXTFACTORY = "ps_helper.middlewares.LegacyConnectContextFac
1616
Scrapy will then use the LegacyConnectContextFactory for all HTTPS connections.
1717

1818
--------------------------------------
19+
20+
# Proxy Rotator Middlewares
21+
22+
This module provides two Scrapy downloader middlewares for rotating HTTP proxies with optional smart banning logic and statistics tracking.
23+
24+
---
25+
26+
## 🧩 Middlewares
27+
28+
### **1. SequentialProxyRotatorMiddleware**
29+
A simple **round-robin** proxy rotation strategy that cycles through proxies sequentially.
30+
31+
#### Enable in `settings.py`
32+
```python
33+
DOWNLOADER_MIDDLEWARES = {
34+
"ps_helper.middlewares.proxy_rotator.SequentialProxyRotatorMiddleware": 620,
35+
}
36+
```
37+
38+
#### Required Setting
39+
```python
40+
PROXY_PROVIDERS = {
41+
"provider1": {"url": "proxy1.com", "port": 8080},
42+
"provider2": {"url": "proxy2.com", "port": 8080, "user": "user", "password": "pass"},
43+
}
44+
```
45+
46+
#### Behavior
47+
- Rotates proxies in order.
48+
- Logs total requests, successes, failures, and success rate for each proxy when the spider closes.
49+
50+
---
51+
52+
### **2. SmartProxyRotatorMiddleware**
53+
A more advanced rotation system that supports banning failed proxies temporarily and two rotation modes (`random` or `round_robin`).
54+
55+
#### Enable in `settings.py`
56+
```python
57+
DOWNLOADER_MIDDLEWARES = {
58+
"ps_helper.middlewares.proxy_rotator.SmartProxyRotatorMiddleware": 620,
59+
}
60+
```
61+
62+
#### Available Settings
63+
```python
64+
PROXY_PROVIDERS = {
65+
"proxy1": {"url": "proxy1.com", "port": 8080},
66+
"proxy2": {"url": "proxy2.com", "port": 8080, "user": "user", "password": "pass"},
67+
}
68+
69+
PROXY_BAN_THRESHOLD = 3 # Number of failures before banning a proxy
70+
PROXY_COOLDOWN = 300 # Cooldown duration in seconds for banned proxies
71+
PROXY_ROTATION_MODE = "random" # 'random' or 'round_robin'
72+
```
73+
74+
#### Features
75+
- Automatically bans proxies that fail too many times.
76+
- Supports **cooldown** (temporary ban).
77+
- Chooses proxies randomly or sequentially while skipping banned ones.
78+
- Displays a detailed summary when the spider closes.
79+
80+
---
81+
82+
## 🧠 Summary Logs Example
83+
When a spider finishes, a summary like this appears in the logs:
84+
85+
```
86+
============================================================
87+
PROXY USAGE SUMMARY
88+
============================================================
89+
Proxy: http://proxy1.com:8080
90+
Total requests: 120
91+
Successes: 110
92+
Failures: 10
93+
Success rate: 91.7%
94+
Banned: NO
95+
--------------------------------------------------
96+
Proxy: http://proxy2.com:8080
97+
Total requests: 50
98+
Successes: 25
99+
Failures: 25
100+
Success rate: 50.0%
101+
Banned: YES
102+
============================================================
103+
```
104+
105+
---

0 commit comments

Comments
 (0)