Authors: Ibad-ur-Rehman Rashid, Junaid Hussain, and Sadam Al-Azani
Venue: Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script (AbjadNLP 2026), Rabat, Morocco.
Publisher: Association for Computational Linguistics (ACL)
Links: Read the full paper on ACL Anthology | PDF Download
Abstract
This paper introduces resources for the computational study of scientific exegesis (Tafsir Ilmi): a structured ontology, a curated dataset of 194 scientifically relevant Quranic verses linked to 260 exegetical records from two authoritative Tafsir books, and an annotation framework that organizes scientific references by topic and sequential context.
Existing Quranic resources treat verses as unstructured text, losing the logical order and causal relationships of scientific concepts documented in exegesis. To address this, we present QurSci-Onto, a three-layer ontology that categorizes verses by scientific domain, links them to authoritative Tafsir, and provides a framework for representing sequential processes through stage-based annotations.
Methodology & Dataset
Our dataset includes page-level citations and covers 8 major scientific topics across 73 nodes. While the full corpus is tagged with broad categories and scientific topics, a specialized subset features granular node-level mappings to capture complex scientific narratives.
Key Findings
We release QurSci-Onto as a foundational resource for Arabic semantic NLP. Our evaluations demonstrate that this structured approach enables significant improvements in semantic retrieval and enables multi-hop sequential reasoning capabilities over standard unstructured baselines.
