Skip to content

v0.6.0

Latest

Choose a tag to compare

@SeanLee97 SeanLee97 released this 19 Oct 04:11
· 2 commits to main since this release

What's Changed

Detailed changes:

  • use uv to manage dependencies
  • simplify the implementation
  • Remove all imports of AngleDataTokenizer
  • Remove all imports of DatasetFormats
  • Remove all .map(AngleDataTokenizer(...)) calls
  • Update dataset field names (text → query for Format B/C) OR use --column_rename_mapping
  • Add is_llm=True to LLM model initialization
  • Replace --prompt_template with --text_prompt, --query_prompt, or --doc_prompt
  • Update training scripts to use accelerate launch
  • Update evaluation code if using the return value
  • Support input data as a list of strings. New data formats:
    • A: {"text1": str | List[str], "text2": str | List[str], "label": float}
    • B: {"query": str | List[str], "positive": str | List[str]}
    • C: {"query": str | List[str], "positive": str | List[str], "negative": str | List[str]}
  • Support fsdp training
  • Update docs

Migration guide: https://github.com/SeanLee97/AnglE/blob/main/MIGRATION_GUIDE.md

Full Changelog: v0.5.6...v0.6.0