Ivan Lorencin1*, Sandi Baressi Šegota1, Damir Bartolin2, Zvonimir Babić2 and Domagoj Frank2
1Juraj Dobrila University of Pula, Faculty of Informatics
2University North
ilorencin [at] unipu.hr
Abstract
Biomedical data are increasingly heterogeneous, multimodal, and inconsistently structured, limiting their effective use in scalable and reproducible artificial intelligence systems. In practice, such data originate from diverse experimental, clinical, and observational settings, often lacking standardized formats and consistent integration. Existing approaches frequently rely on rigid preprocessing assumptions, constraining their ability to generalize across real-world biomedical data sources.
This work introduces an AI-centric framework for adaptive data processing and interpretation in biomedical contexts. The framework operates on arbitrarily structured inputs, enabling flexible ingestion and integration without requiring predefined schemas. Instead, representations are learned directly from the data, allowing the system to adapt to varying formats, modalities, and acquisition conditions commonly encountered in biomedical applications.
At its core, the approach transforms heterogeneous biomedical inputs into structured representations that capture underlying relationships and enable consistent downstream analysis. These representations are processed through analytical models, supporting robust and scalable inference across diverse datasets while preserving domain-relevant context.
To enable controlled and interpretable reasoning, generative components are employed as structured transformation modules, producing schema-constrained outputs (e.g., JSON-based representations) rather than free-form text. This allows deterministic integration of generative reasoning into the analytical pipeline while maintaining interpretability and consistency.
For human-facing interpretation, a retrieval-augmented generation (RAG) layer is applied at the reporting stage. Retrieved intermediate results, domain-relevant context, and metadata are used to guide generative models, ensuring that outputs remain grounded in the underlying data. This improves factual consistency, reduces hallucinations, and enables traceable reasoning through the inclusion of provenance and confidence indicators.
The framework supports iterative refinement of both representations and models, enabling continuous adaptation to evolving biomedical datasets. By combining adaptive data processing, structured generative reasoning, and retrieval-grounded reporting, the approach enables scalable, reproducible, and interpretable AI systems for complex biomedical data analysis.
Keywords: Adaptive Learning,Explainability,Multimodal Integration,Retrieval Generation
Acknowledgement: This work was (partially) supported by the EC Digital Europe Programme EDIH Adria 2.0 (101256325); SPIN projects IP.1.1.03.0120, IP.1.1.03.0028 and IP.1.1.03.0039; and NextGenerationEU University grants: uniri-iz-25-6, uniri-iz-25-220, IIP_010144, IIP_010136, and UNIN-TEH-25-1-8.

