Skip to content

Commit a340264

Browse files
committed
Initial commit
Signed-off-by: Brian Donohue <brian@team.instapaper.com>
0 parents  commit a340264

23 files changed

Lines changed: 3472 additions & 0 deletions

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
__pycache__
2+
dist/

BUILD.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Build Instructions
2+
3+
## Prerequisites
4+
5+
- Python 3.10 or later
6+
- pip
7+
8+
## Setup
9+
10+
Install the runtime and build dependencies:
11+
12+
```sh
13+
pip install -e .
14+
```
15+
16+
For development (linting, type-checking, testing), also install the dev extras:
17+
18+
```sh
19+
pip install -e ".[dev]"
20+
```
21+
22+
For builds and deploys , also install the build extras:
23+
24+
```sh
25+
pip install -e ".[build]"
26+
```
27+
28+
## Running Tests
29+
30+
```sh
31+
pytest
32+
```
33+
34+
With coverage:
35+
36+
```sh
37+
pytest --cov=instaparser
38+
```
39+
40+
## Linting & Type Checking
41+
42+
```sh
43+
ruff check instaparser/
44+
black --check instaparser/
45+
mypy instaparser/
46+
```
47+
48+
## Building the Package
49+
50+
Build both the sdist and wheel:
51+
52+
```sh
53+
python -m build
54+
```
55+
56+
The outputs will be in the `dist/` directory.
57+
58+
## Publishing to PyPI
59+
60+
Upload the built artifacts with twine:
61+
62+
```sh
63+
twine upload dist/*
64+
```
65+
66+
To test against Test PyPI first:
67+
68+
```sh
69+
twine upload --repository testpypi dist/*
70+
```
71+
72+
## Versioning
73+
74+
The package version is defined in `instaparser/__init__.py` and read automatically by Hatch at build time via the `[tool.hatch.version]` configuration in `pyproject.toml`.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 Instant Paper, Inc.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# Instaparser Python Library
2+
3+
A Python client library for the [Instaparser API](https://www.instaparser.com), providing a simple and intuitive interface for parsing articles, generating summaries, and processing PDFs.
4+
5+
## Installation
6+
7+
```bash
8+
pip install instaparser
9+
```
10+
11+
## Quick Start
12+
13+
```python
14+
from instaparser import InstaparserClient
15+
16+
# Initialize the client with your API key
17+
client = InstaparserClient(api_key="your-api-key")
18+
19+
# Parse an article from a URL
20+
article = client.Article(url="https://example.com/article")
21+
22+
# Access article properties
23+
print(article.title)
24+
print(article.body) # HTML or text content
25+
print(article.author)
26+
print(article.words)
27+
```
28+
29+
## Features
30+
31+
- **Article Parsing**: Extract clean HTML or text from web articles
32+
- **Summary Generation**: Generate AI-powered summaries with key sentences
33+
- **PDF Processing**: Parse PDFs from URLs or file uploads
34+
- **Error Handling**: Comprehensive exception handling for API errors
35+
- **Type Hints**: Full type annotations for better IDE support
36+
37+
## Usage
38+
39+
### Article Parsing
40+
41+
Parse articles from URLs or HTML content:
42+
43+
```python
44+
from instaparser import InstaparserClient
45+
46+
client = InstaparserClient(api_key="your-api-key")
47+
48+
# Parse from URL (HTML output)
49+
article = client.Article(url="https://example.com/article")
50+
print(article.html) # HTML content
51+
print(article.body) # Same as html when output='html'
52+
53+
# Parse from URL (text output)
54+
article = client.Article(url="https://example.com/article", output="text")
55+
print(article.text) # Plain text content
56+
print(article.body) # Same as text when output='text'
57+
58+
# Parse from HTML content
59+
html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>"
60+
article = client.Article(url="https://example.com/article", content=html_content)
61+
62+
# Disable cache
63+
article = client.Article(url="https://example.com/article", use_cache=False)
64+
```
65+
66+
### Article Properties
67+
68+
The `Article` object provides access to all parsed metadata:
69+
70+
```python
71+
article = client.Article(url="https://example.com/article")
72+
73+
# Basic properties
74+
article.url # Canonical URL
75+
article.title # Article title
76+
article.site_name # Website name
77+
article.author # Author name
78+
article.date # Published date (UNIX timestamp)
79+
article.description # Article description
80+
article.thumbnail # Thumbnail image URL
81+
article.words # Word count
82+
article.is_rtl # Right-to-left language flag
83+
84+
# Content
85+
article.body # HTML or text (depending on output format)
86+
article.html # HTML content (if output='html')
87+
article.text # Plain text (if output='text')
88+
89+
# Media
90+
article.images # List of images
91+
article.videos # List of embedded videos
92+
```
93+
94+
### Summary Generation
95+
96+
Generate AI-powered summaries:
97+
98+
```python
99+
# Generate summary
100+
summary = client.Summary(url="https://example.com/article")
101+
102+
print(summary.overview) # Concise summary
103+
print(summary.key_sentences) # List of key sentences
104+
105+
# Stream summary with callback (for real-time updates)
106+
def on_stream_line(line):
107+
print(f"Streaming: {line}")
108+
109+
summary = client.Summary(
110+
url="https://example.com/article",
111+
stream_callback=on_stream_line
112+
)
113+
```
114+
115+
### PDF Processing
116+
117+
Parse PDFs from URLs or files. The PDF class inherits from Article, so it has all the same properties:
118+
119+
```python
120+
# Parse PDF from URL
121+
pdf = client.PDF(url="https://example.com/document.pdf")
122+
123+
# Parse PDF from file
124+
with open('document.pdf', 'rb') as f:
125+
pdf = client.PDF(file=f)
126+
127+
# Parse PDF with text output
128+
pdf = client.PDF(url="https://example.com/document.pdf", output="text")
129+
print(pdf.text)
130+
print(pdf.body) # Same as text when output='text'
131+
132+
# Access all Article properties
133+
print(pdf.title)
134+
print(pdf.words)
135+
print(pdf.images)
136+
```
137+
138+
## Error Handling
139+
140+
The SDK provides specific exception types for different error scenarios:
141+
142+
```python
143+
from instaparser import (
144+
InstaparserClient,
145+
InstaparserAuthenticationError,
146+
InstaparserRateLimitError,
147+
InstaparserValidationError,
148+
InstaparserAPIError,
149+
)
150+
151+
client = InstaparserClient(api_key="your-api-key")
152+
153+
try:
154+
article = client.Article(url="https://example.com/article")
155+
except InstaparserAuthenticationError:
156+
print("Invalid API key")
157+
except InstaparserRateLimitError:
158+
print("Rate limit exceeded")
159+
except InstaparserValidationError:
160+
print("Invalid request parameters")
161+
except InstaparserAPIError as e:
162+
print(f"API error: {e} (status: {e.status_code})")
163+
```
164+
165+
## API Reference
166+
167+
### InstaparserClient
168+
169+
Main client class for interacting with the Instaparser API.
170+
171+
#### `__init__(api_key: str)`
172+
173+
Initialize the client.
174+
175+
- `api_key`: Your Instaparser API key
176+
177+
#### `Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article`
178+
179+
Parse an article from a URL or HTML content.
180+
181+
- `url`: URL of the article (required)
182+
- `content`: Optional HTML content to parse instead of fetching from URL
183+
- `output`: Output format - `'html'` (default) or `'text'`
184+
- `use_cache`: Whether to use cache (default: `True`)
185+
186+
Returns: `Article` object
187+
188+
#### `Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary`
189+
190+
Generate a summary of an article.
191+
192+
- `url`: URL of the article (required)
193+
- `content`: Optional HTML content to parse instead of fetching from URL
194+
- `use_cache`: Whether to use cache (default: `True`)
195+
- `stream_callback`: Optional callback function called for each line of streaming response. If provided, enables streaming mode.
196+
197+
Returns: `Summary` object with `key_sentences` and `overview` attributes
198+
199+
#### `PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF`
200+
201+
Parse a PDF from a URL or file.
202+
203+
- `url`: URL of the PDF (required for GET request)
204+
- `file`: PDF file to upload (required for POST request)
205+
- `output`: Output format - `'html'` (default) or `'text'`
206+
- `use_cache`: Whether to use cache (default: `True`)
207+
208+
Returns: `PDF` object (inherits from `Article`)
209+
210+
### Article
211+
212+
Represents a parsed article from Instaparser.
213+
214+
#### Properties
215+
216+
- `url`: Canonical URL
217+
- `title`: Article title
218+
- `site_name`: Website name
219+
- `author`: Author name
220+
- `date`: Published date (UNIX timestamp)
221+
- `description`: Article description
222+
- `thumbnail`: Thumbnail image URL
223+
- `words`: Word count
224+
- `is_rtl`: Right-to-left language flag
225+
- `images`: List of images
226+
- `videos`: List of embedded videos
227+
- `body`: Article body (HTML or text)
228+
- `html`: HTML content (if output was 'html')
229+
- `text`: Plain text content (if output was 'text')
230+
231+
### PDF
232+
233+
Represents a parsed PDF from Instaparser. Inherits from `Article` and has all the same properties. PDFs always have `is_rtl=False` and `videos=[]`.
234+
235+
### Summary
236+
237+
Represents a summary result from Instaparser.
238+
239+
#### Properties
240+
241+
- `key_sentences`: List of key sentences extracted from the article
242+
- `overview`: Concise summary of the article
243+
244+
## License
245+
246+
MIT
247+
248+
## Support
249+
250+
For support, email support@instaparser.com or visit [https://www.instaparser.com](https://www.instaparser.com).

0 commit comments

Comments
 (0)