Skip to content

Tahsen77/Digital-Store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 

Repository files navigation

Arabic–German Legal Dataset v1

Arabic–German Legal Dataset v2

High‑quality bilingual corpus for legal, governmental, and administrative NLP applications.

📘 Overview

This dataset provides a professionally curated collection of Arabic–German parallel sentences focused on legal, governmental, administrative, and regulatory language. It is designed for researchers, developers, and organizations building multilingual AI systems that require precise terminology and consistent structure.

The dataset is optimized for:

  • Machine Translation (MT)
  • Large Language Models (LLMs)
  • Legal text classification
  • Named Entity Recognition (NER)
  • Cross‑lingual information retrieval
  • Governmental and administrative NLP tools

📂 Dataset Contents

The dataset is delivered as a ZIP archive containing:

  • Arabic–German Legal Dataset v1

  • 1016 of Arabic–German parallel sentence pairs

  • Legal, governmental, and administrative terminology

  • Cleaned, normalized, and standardized formatting

  • AI‑ready structure suitable for training MT and LLM models

  • Arabic–German Legal Dataset v2

  • 1040 of Arabic–German parallel sentence pairs

  • Legal, governmental, and administrative terminology

  • Cleaned, normalized, and standardized formatting

  • AI‑ready structure suitable for training MT and LLM models

All content is structured to ensure maximum consistency and usability in machine learning workflows.


🧪 Sample Sentences (Preview)

Below are real‑style examples demonstrating the type of content included in the dataset.
Formatted in a parallel structure suitable for AI training:

Example 1

DE: Die zuständige Behörde hat den Antrag geprüft und genehmigt.
AR: قامت الجهة المختصة بمراجعة الطلب والموافقة عليه.

Example 2

DE: Die Auslegung der einschlägigen Rechtsvorschriften erfolgt unter Berücksichtigung der höchstrichterlichen Rechtsprechung sowie der verbindlichen Verwaltungsvorschriften. AR: تتم تفسير الأحكام القانونية ذات الصلة مع مراعاة الاجتهادات القضائية العليا والتعليمات الإدارية الملزمة.

Example 3

DE: Jede Partei haftet für Schäden, die aus der Verletzung ihrer gesetzlichen Pflichten entstehen. AR: تتحمل كل جهة المسؤولية عن الأضرار الناشئة عن إخلالها بواجباتها القانونية.

These examples reflect the dataset’s focus on legal precision, administrative clarity, and bilingual consistency.


🎯 Use Cases

  • Training Arabic↔German MT systems
  • Fine‑tuning LLMs for legal and governmental tasks
  • Building multilingual legal assistants
  • Automating document classification and analysis
  • Enhancing cross‑lingual search and retrieval
  • Supporting academic and commercial NLP research

💳 Purchase Information

To purchase the dataset, please contact:

📧 Email: albsyrthsyn@gmail.com

You will receive:

  1. The USDT payment address
  2. The exact amount to send ( 500 $ ) each- if both = ( 800 $ )
  3. A request for payment confirmation (screenshot)
  4. A secure download link immediately after verification

This manual process ensures maximum security and prevents unauthorized access to the dataset.


📜 License

The dataset is licensed for research and commercial use.
Redistribution, resale, or public sharing of the dataset or any part of it is strictly prohibited.


📞 Contact

For inquiries, custom datasets, or collaboration opportunities: