These instructions are for AI assistants working in this project.
Always open @/openspec/AGENTS.md when the request:
- Mentions planning or proposals (words like proposal, spec, change, plan)
- Introduces new capabilities, breaking changes, architecture shifts, or big performance/security work
- Sounds ambiguous and you need the authoritative spec before coding
Use @/openspec/AGENTS.md to learn:
- How to create and apply change proposals
- Spec format and conventions
- Project structure and guidelines
Keep this managed block so 'openspec update' can refresh the instructions.
Instructions for AI coding agents working on the IterableData project.
- Install dependencies:
pip install -e ".[dev]" - Run tests:
pytest --verbose(includes coverage automatically) - Run tests with parallel execution:
pytest -n auto - Run linter:
ruff check iterable tests - Format code:
ruff format iterable tests - Type check:
mypy iterable - Run all checks:
ruff check iterable tests && ruff format --check iterable tests && pytest
- Security scan:
bandit -r iterable -ll - Dependency vulnerabilities:
pip-audit --requirement <(pip freeze) - Dead code detection:
vulture iterable --min-confidence 80 - Code complexity:
radon cc iterable --min B - Documentation style:
pydocstyle iterable - Coverage report:
pytest --cov=iterable --cov-report=html(openshtmlcov/index.html)
- Python 3.10+ with type hints where appropriate
- Maximum line length: 120 characters (configured in
pyproject.toml) - Use
rufffor linting and formatting (E, F, I, B, UP rules enabled) - Use double quotes for strings consistently
- Follow PEP 8 with project-specific exceptions
- Always use context managers (
withstatements) for file operations - Import organization: standard library, third-party, local imports
iterable/- Main package directoryhelpers/- Utility functions (detect, schema, utils)datatypes/- Format-specific implementations (CSV, JSON, Parquet, etc.)codecs/- Compression codec implementationsengines/- Processing engines (DuckDB, internal)convert/- Format conversion utilitiespipeline/- Data pipeline processing
tests/- Test suite (one test file per format/feature)examples/- Usage examplestestdata/- Test data filesfixtures/- Test fixtures
- All tests are in the
tests/directory - Test files follow pattern:
test_*.py - Test classes:
Test* - Test functions:
test_* - Run specific test:
pytest tests/test_csv.py -v - Run specific test function:
pytest tests/test_csv.py::TestCSV::test_read -v - Tests should pass for Python 3.10, 3.11, and 3.12
- Always run tests before committing:
pytest --verbose - Some format tests may be skipped if optional dependencies are missing (this is expected)
- Main entry point:
from iterable.helpers.detect import open_iterable - Format-specific:
from iterable.datatypes.csv import CSVIterable - Codecs:
from iterable.codecs.gzipcodec import GZIPCodec - Always use
open_iterable()for user-facing code; direct class usage is for advanced cases
- Always use context managers:
with open_iterable('file.csv') as source: - Never call
.close()when usingwithstatements - Reset iterators with
.reset()method when needed - Handle compression automatically via filename detection or explicit codec
- Format detection failures should provide helpful error messages
- Missing optional dependencies should raise clear ImportError with installation instructions
- Invalid file formats should raise appropriate exceptions (ValueError, TypeError)
- Always handle file I/O errors gracefully
- Create new file in
iterable/datatypes/(e.g.,newformat.py) - Implement class inheriting from
BaseIterableiniterable/base.py - Implement required methods:
read(),write(),read_bulk(),write_bulk(), etc. - Add format detection logic in
iterable/helpers/detect.py - Create comprehensive tests in
tests/test_newformat.py - Update
detect_file_type()to recognize the format - Add optional dependency to
pyproject.tomlif needed - Update documentation
- Create new file in
iterable/codecs/(e.g.,newcodec.py) - Implement class with
read(),write(),close()methods - Update
iterable/helpers/detect.pyto detect the codec - Add compression format detection logic
- Create tests in
tests/test_newcodec.pyor relevant test file - Add optional dependency to
pyproject.toml
- Use
open_iterable()for automatic format detection - Use format-specific classes only when needed
- Always close files or use context managers
- Prefer bulk operations (
read_bulk,write_bulk) for performance - Use DuckDB engine when appropriate (CSV, JSONL files)
- Handle encoding automatically via
chardetor user specification
- Run
ruff check iterable testsbefore committing - Run
ruff format iterable teststo auto-format - Fix all linting errors; warnings are treated as errors
- Type hints are encouraged but not strictly required (mypy runs but failures are allowed)
- Pre-commit hooks will automatically run security scans, code quality checks, and formatting on commit
- Install pre-commit hooks:
pre-commit install
- Write clear, descriptive commit messages
- Test your changes:
pytest --verbose - Run linter:
ruff check iterable tests - Ensure code follows project style
- Update tests if adding new functionality
- Update documentation if changing public APIs
- All tests must pass
- Linter must pass:
ruff check iterable tests - Code should be formatted:
ruff format --check iterable tests - Include tests for new features
- Update relevant documentation
- Describe changes clearly in PR description
- Use
iterable.helpers.detect.open_iterable()for most use cases - Check existing format implementations for patterns
- Look at
tests/test_*.pyfiles for usage examples - Test with compressed files (
.gz,.bz2,.xz,.zst, etc.) - Test with various encodings for text formats
- Handle edge cases: empty files, malformed data, missing dependencies
- Some formats require optional dependencies (see
pyproject.tomlfor optional-dependencies) - DuckDB engine only supports certain formats (CSV, JSONL, JSON) and codecs (GZIP, ZStandard)
- Large files should use streaming (iterator interface) to avoid memory issues
- XML parsing requires specifying tag names via
iterableargs={'tagname': 'item'}
This project includes Cursor Skills (.cursor/skills/) that provide specialized guidance for common tasks:
- iterabledata-development - Core development workflows and conventions
- openspec-workflows - OpenSpec proposal and implementation workflows
- format-implementation - Guide for implementing new data formats
- testing-patterns - Testing conventions and best practices
- database-engine-implementation - Guide for implementing database engines
Skills are automatically applied by Cursor AI when relevant. See .cursor/skills/README.md for details.
- Main documentation: See
README.mdanddocs/directory - API reference:
docs/docs/api/ - Format documentation:
docs/docs/formats/ - Examples:
examples/directory - AI Integration guides:
docs/integrations/directory