|
| 1 | +# Instaparser Python Library |
| 2 | + |
| 3 | +A Python client library for the [Instaparser API](https://www.instaparser.com), providing a simple and intuitive interface for parsing articles, generating summaries, and processing PDFs. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +```bash |
| 8 | +pip install instaparser |
| 9 | +``` |
| 10 | + |
| 11 | +## Quick Start |
| 12 | + |
| 13 | +```python |
| 14 | +from instaparser import InstaparserClient |
| 15 | + |
| 16 | +# Initialize the client with your API key |
| 17 | +client = InstaparserClient(api_key="your-api-key") |
| 18 | + |
| 19 | +# Parse an article from a URL |
| 20 | +article = client.Article(url="https://example.com/article") |
| 21 | + |
| 22 | +# Access article properties |
| 23 | +print(article.title) |
| 24 | +print(article.body) # HTML or text content |
| 25 | +print(article.author) |
| 26 | +print(article.words) |
| 27 | +``` |
| 28 | + |
| 29 | +## Features |
| 30 | + |
| 31 | +- **Article Parsing**: Extract clean HTML or text from web articles |
| 32 | +- **Summary Generation**: Generate AI-powered summaries with key sentences |
| 33 | +- **PDF Processing**: Parse PDFs from URLs or file uploads |
| 34 | +- **Error Handling**: Comprehensive exception handling for API errors |
| 35 | +- **Type Hints**: Full type annotations for better IDE support |
| 36 | + |
| 37 | +## Usage |
| 38 | + |
| 39 | +### Article Parsing |
| 40 | + |
| 41 | +Parse articles from URLs or HTML content: |
| 42 | + |
| 43 | +```python |
| 44 | +from instaparser import InstaparserClient |
| 45 | + |
| 46 | +client = InstaparserClient(api_key="your-api-key") |
| 47 | + |
| 48 | +# Parse from URL (HTML output) |
| 49 | +article = client.Article(url="https://example.com/article") |
| 50 | +print(article.html) # HTML content |
| 51 | +print(article.body) # Same as html when output='html' |
| 52 | + |
| 53 | +# Parse from URL (text output) |
| 54 | +article = client.Article(url="https://example.com/article", output="text") |
| 55 | +print(article.text) # Plain text content |
| 56 | +print(article.body) # Same as text when output='text' |
| 57 | + |
| 58 | +# Parse from HTML content |
| 59 | +html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>" |
| 60 | +article = client.Article(url="https://example.com/article", content=html_content) |
| 61 | + |
| 62 | +# Disable cache |
| 63 | +article = client.Article(url="https://example.com/article", use_cache=False) |
| 64 | +``` |
| 65 | + |
| 66 | +### Article Properties |
| 67 | + |
| 68 | +The `Article` object provides access to all parsed metadata: |
| 69 | + |
| 70 | +```python |
| 71 | +article = client.Article(url="https://example.com/article") |
| 72 | + |
| 73 | +# Basic properties |
| 74 | +article.url # Canonical URL |
| 75 | +article.title # Article title |
| 76 | +article.site_name # Website name |
| 77 | +article.author # Author name |
| 78 | +article.date # Published date (UNIX timestamp) |
| 79 | +article.description # Article description |
| 80 | +article.thumbnail # Thumbnail image URL |
| 81 | +article.words # Word count |
| 82 | +article.is_rtl # Right-to-left language flag |
| 83 | + |
| 84 | +# Content |
| 85 | +article.body # HTML or text (depending on output format) |
| 86 | +article.html # HTML content (if output='html') |
| 87 | +article.text # Plain text (if output='text') |
| 88 | + |
| 89 | +# Media |
| 90 | +article.images # List of images |
| 91 | +article.videos # List of embedded videos |
| 92 | +``` |
| 93 | + |
| 94 | +### Summary Generation |
| 95 | + |
| 96 | +Generate AI-powered summaries: |
| 97 | + |
| 98 | +```python |
| 99 | +# Generate summary |
| 100 | +summary = client.Summary(url="https://example.com/article") |
| 101 | + |
| 102 | +print(summary.overview) # Concise summary |
| 103 | +print(summary.key_sentences) # List of key sentences |
| 104 | + |
| 105 | +# Stream summary with callback (for real-time updates) |
| 106 | +def on_stream_line(line): |
| 107 | + print(f"Streaming: {line}") |
| 108 | + |
| 109 | +summary = client.Summary( |
| 110 | + url="https://example.com/article", |
| 111 | + stream_callback=on_stream_line |
| 112 | +) |
| 113 | +``` |
| 114 | + |
| 115 | +### PDF Processing |
| 116 | + |
| 117 | +Parse PDFs from URLs or files. The PDF class inherits from Article, so it has all the same properties: |
| 118 | + |
| 119 | +```python |
| 120 | +# Parse PDF from URL |
| 121 | +pdf = client.PDF(url="https://example.com/document.pdf") |
| 122 | + |
| 123 | +# Parse PDF from file |
| 124 | +with open('document.pdf', 'rb') as f: |
| 125 | + pdf = client.PDF(file=f) |
| 126 | + |
| 127 | +# Parse PDF with text output |
| 128 | +pdf = client.PDF(url="https://example.com/document.pdf", output="text") |
| 129 | +print(pdf.text) |
| 130 | +print(pdf.body) # Same as text when output='text' |
| 131 | + |
| 132 | +# Access all Article properties |
| 133 | +print(pdf.title) |
| 134 | +print(pdf.words) |
| 135 | +print(pdf.images) |
| 136 | +``` |
| 137 | + |
| 138 | +## Error Handling |
| 139 | + |
| 140 | +The SDK provides specific exception types for different error scenarios: |
| 141 | + |
| 142 | +```python |
| 143 | +from instaparser import ( |
| 144 | + InstaparserClient, |
| 145 | + InstaparserAuthenticationError, |
| 146 | + InstaparserRateLimitError, |
| 147 | + InstaparserValidationError, |
| 148 | + InstaparserAPIError, |
| 149 | +) |
| 150 | + |
| 151 | +client = InstaparserClient(api_key="your-api-key") |
| 152 | + |
| 153 | +try: |
| 154 | + article = client.Article(url="https://example.com/article") |
| 155 | +except InstaparserAuthenticationError: |
| 156 | + print("Invalid API key") |
| 157 | +except InstaparserRateLimitError: |
| 158 | + print("Rate limit exceeded") |
| 159 | +except InstaparserValidationError: |
| 160 | + print("Invalid request parameters") |
| 161 | +except InstaparserAPIError as e: |
| 162 | + print(f"API error: {e} (status: {e.status_code})") |
| 163 | +``` |
| 164 | + |
| 165 | +## API Reference |
| 166 | + |
| 167 | +### InstaparserClient |
| 168 | + |
| 169 | +Main client class for interacting with the Instaparser API. |
| 170 | + |
| 171 | +#### `__init__(api_key: str)` |
| 172 | + |
| 173 | +Initialize the client. |
| 174 | + |
| 175 | +- `api_key`: Your Instaparser API key |
| 176 | + |
| 177 | +#### `Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article` |
| 178 | + |
| 179 | +Parse an article from a URL or HTML content. |
| 180 | + |
| 181 | +- `url`: URL of the article (required) |
| 182 | +- `content`: Optional HTML content to parse instead of fetching from URL |
| 183 | +- `output`: Output format - `'html'` (default) or `'text'` |
| 184 | +- `use_cache`: Whether to use cache (default: `True`) |
| 185 | + |
| 186 | +Returns: `Article` object |
| 187 | + |
| 188 | +#### `Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary` |
| 189 | + |
| 190 | +Generate a summary of an article. |
| 191 | + |
| 192 | +- `url`: URL of the article (required) |
| 193 | +- `content`: Optional HTML content to parse instead of fetching from URL |
| 194 | +- `use_cache`: Whether to use cache (default: `True`) |
| 195 | +- `stream_callback`: Optional callback function called for each line of streaming response. If provided, enables streaming mode. |
| 196 | + |
| 197 | +Returns: `Summary` object with `key_sentences` and `overview` attributes |
| 198 | + |
| 199 | +#### `PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF` |
| 200 | + |
| 201 | +Parse a PDF from a URL or file. |
| 202 | + |
| 203 | +- `url`: URL of the PDF (required for GET request) |
| 204 | +- `file`: PDF file to upload (required for POST request) |
| 205 | +- `output`: Output format - `'html'` (default) or `'text'` |
| 206 | +- `use_cache`: Whether to use cache (default: `True`) |
| 207 | + |
| 208 | +Returns: `PDF` object (inherits from `Article`) |
| 209 | + |
| 210 | +### Article |
| 211 | + |
| 212 | +Represents a parsed article from Instaparser. |
| 213 | + |
| 214 | +#### Properties |
| 215 | + |
| 216 | +- `url`: Canonical URL |
| 217 | +- `title`: Article title |
| 218 | +- `site_name`: Website name |
| 219 | +- `author`: Author name |
| 220 | +- `date`: Published date (UNIX timestamp) |
| 221 | +- `description`: Article description |
| 222 | +- `thumbnail`: Thumbnail image URL |
| 223 | +- `words`: Word count |
| 224 | +- `is_rtl`: Right-to-left language flag |
| 225 | +- `images`: List of images |
| 226 | +- `videos`: List of embedded videos |
| 227 | +- `body`: Article body (HTML or text) |
| 228 | +- `html`: HTML content (if output was 'html') |
| 229 | +- `text`: Plain text content (if output was 'text') |
| 230 | + |
| 231 | +### PDF |
| 232 | + |
| 233 | +Represents a parsed PDF from Instaparser. Inherits from `Article` and has all the same properties. PDFs always have `is_rtl=False` and `videos=[]`. |
| 234 | + |
| 235 | +### Summary |
| 236 | + |
| 237 | +Represents a summary result from Instaparser. |
| 238 | + |
| 239 | +#### Properties |
| 240 | + |
| 241 | +- `key_sentences`: List of key sentences extracted from the article |
| 242 | +- `overview`: Concise summary of the article |
| 243 | + |
| 244 | +## License |
| 245 | + |
| 246 | +MIT |
| 247 | + |
| 248 | +## Support |
| 249 | + |
| 250 | +For support, email support@instaparser.com or visit [https://www.instaparser.com](https://www.instaparser.com). |
0 commit comments