Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF Free Download

1 / 24
0 views24 pages

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF Free Download

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF free Download. Think more deeply and widely.

Academic Editors: Chihyu Hsu and
Shuo-Tsung Chen
Received: 26 September 2025
Revised: 14 November 2025
Accepted: 15 November 2025
Published: 17 November 2025
Citation: Worragin, P.;
Chernbumroong, S.; Puritat, K.;
Julrode, P.; Intawong, K. Towards
Intelligent Virtual Clerks: AI-Driven
Automation for Clinical Data Entry in
Dialysis Care. Technologies 2025,13,
530. https://doi.org/10.3390/
technologies13110530
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (
https://creativecommons.org/licens
es/by/4.0/).
Article
Towards Intelligent Virtual Clerks: AI-Driven Automation for
Clinical Data Entry in Dialysis Care
Perasuk Worragin 1, Suepphong Chernbumroong 1, Kitti Puritat 2, Phichete Julrode 2,*
and Kannikar Intawong 3,*
1College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
perasuk_w@cmu.ac.th (P.W.); suepphong.c@cmu.ac.th (S.C.)
2Department of Library and Information Science, Faculty of Humanities, Chiang Mai University,
Chiang Mai 50200, Thailand; kitti.p@cmu.ac.th
3Faculty of Public Health, Chiang Mai University, Chiang Mai 50200, Thailand
*Correspondence: phichete.j@cmu.ac.th (P.J.); kannikar.i@cmu.ac.th (K.I.)
Abstract
Manual data entry in dialysis centers is time-consuming, error-prone, and increases the
administrative burden on healthcare professionals. Traditional optical character recognition
(OCR) systems partially automate this process but lack the ability to handle complex data
anomalies and ensure reliable clinical documentation. This study presents the design and
evaluation of an AI-enhanced OCR system that integrates advanced image processing,
rule-based validation, and large language model-driven anomaly detection to improve
data accuracy, workflow efficiency, and user experience. A total of 65 laboratory reports,
each containing approximately 35 fields, were processed and compared under two configu-
rations: a basic OCR system and the AI-enhanced OCR system. System performance was
evaluated using three key metrics: error detection accuracy across three error categories
(Missing Values, Out-of-Range, and Typo/Free-text), workflow efficiency measured by
average processing time per record and total completion time, and user acceptance mea-
sured using the System Usability Scale (SUS). The AI-enhanced OCR system outperformed
the basic OCR system in all metrics, particularly in detecting and correcting Out-of-Range
errors, such as decimal placement issues, achieving near-perfect precision and recall. It
reduced the average processing time per record by almost 50% (85.2 to 42.1 s) and improved
usability, scoring 81.0 (Excellent) compared to 75.0 (Good). These results demonstrate the
potential of AI-driven OCR to reduce clerical workload, improve healthcare data quality,
and streamline clinical workflows, while maintaining a human-in-the-loop verification
process to ensure patient safety and data integrity.
Keywords: AI-enhanced OCR; clinical data entry automation; generative AI; human-in-
the-loop; electronic health records
1. Introduction
In Thailand, nephrology centers play a critical role in providing life-sustaining
hemodialysis care to an estimated 80,000 patients with end-stage kidney disease across
more than 2500 units nationwide. Despite the essential role of these facilities, the exchange
of data between dialysis centers and government agencies remains limited. National au-
thorities have avoided implementing open application programming interfaces (APIs) for
transmitting health records because of concerns about cybersecurity and data privacy. This
has forced healthcare providers to rely on repetitive, manual data entry, often rekeying the
Technologies 2025,13, 530 https://doi.org/10.3390/technologies13110530
Technologies 2025,13, 530 2 of 24
same information into multiple platforms maintained by different government agencies.
Nurses and administrative staff must frequently log into web-based portals and desktop ap-
plications to submit treatment data, claims, and patient outcomes. The duplication of effort
is not only inefficient but also increases the likelihood of errors, delays, and inconsistencies
in reporting [1,2].
The absence of secure and interoperable mechanisms for data exchange has produced
a series of operational challenges in dialysis centers. Surveys and observational studies
suggest that nurses may spend 20–50 percent of their time on clerical tasks such as retyping
laboratory results and treatment notes into government systems. This workload diverts
attention from patient care and increases the risk of fatigue-related mistakes. Clinical
consequences also arise. For instance, when laboratory results are delayed in being entered,
physicians may lack timely information needed for decision-making, forcing patients to
wait longer for follow-up or medication adjustments. International studies have confirmed
that manual transcription is one of the least accurate and most time-consuming methods
of clinical documentation. Systematic reviews show that automation, including optical
character recognition, can substantially reduce error rates compared to manual entry [
3
].
Likewise, recent research demonstrates that novel optical character recognition systems
can outperform human operators in terms of speed and reliability in real-world clinical
environments [
4
]. These findings highlight the potential for advanced image-processing
techniques to serve as reliable tools in streamlining nephrology information management.
Beyond accuracy and speed, the impact of manual entry on workforce morale has
also drawn increasing attention. Repeated clerical duties contribute to staff burnout and
dissatisfaction, which in turn affect retention in already understaffed healthcare systems [
5
].
In nephrology care, where continuity and specialized expertise are vital, losing trained
nurses and technicians because of workload-related fatigue can negatively influence patient
outcomes. Accordingly, solutions that not only reduce data errors but also alleviate staff
burden can bring systemic benefits, including more sustainable workforce management.
Agent-based artificial intelligence systems have recently emerged as powerful tools capable
of planning, coordinating, and executing complex workflows. These intelligent agents can
simulate the tasks of human clerks by navigating between different applications, validating
extracted data, and interacting with legacy systems. The integration of agent-based AI with
modern image processing therefore offers a promising approach to building “virtual clerks”
that can safely and efficiently carry out administrative tasks in healthcare [6].
The present project introduces a fully implemented prototype of an intelligent virtual
clerk developed as an add-on module to the NephroM system, an enterprise resource
platform (ERP) widely used for dialysis data management. Supported by research funding
from the National Research Council of Thailand (NRCT), this project aims to operationalize
intelligent automation within existing clinical workflows. The system integrates image-
processing pipelines, rule-based validation, and large-language-model (LLM) reasoning
to automate the capture, verification, and secure submission of clinical data to external
government platforms.
The specific objectives of this study are threefold: (1) to improve the accuracy and effi-
ciency of clinical data entry through AI-enhanced automation; (2) to evaluate the system’s
ability to reduce administrative workload and enhance healthcare workers’ job satisfaction
by minimizing repetitive clerical tasks; and (3) to assess its potential to shorten patient
waiting times by accelerating documentation and submission workflows. Accordingly, this
paper focuses on the design, implementation, and evaluation of the OCR and validation
modules as core components of the working prototype. By aligning with national digital-
health strategies while maintaining compliance with cybersecurity policies, the proposed
framework demonstrates how an AI-driven add-on module can enhance interoperability,
Technologies 2025,13, 530 3 of 24
improve service quality, and allow healthcare professionals to dedicate more time to patient
care rather than clerical work. The primary contribution of this work lies in its practical
applicability within real clinical workflows rather than in introducing algorithmic novelty.
2. Related Work
2.1. AI in Healthcare Information Management
Artificial intelligence has become a core enabler of healthcare information management
by improving how data are captured, curated, and used for clinical and administrative
decision-making. At the infrastructure level, AI methods help transform heterogeneous elec-
tronic health record data into machine-actionable formats, support patient representation
learning, and enable predictive analytics that inform quality improvement and popula-
tion management [
7
9
]. Beyond prediction, AI is increasingly deployed to streamline
routine information workflows such as data abstraction, coding, and document classifi-
cation, with the aim of reducing latency and improving completeness and consistency
in health datasets [
10
]. These capabilities are critical in domains like nephrology, where
high-frequency encounters and laboratory monitoring generate substantial documentation
and reporting requirements. Coupled with modern image processing and optical character
recognition, AI systems can accurately extract key fields from semi-structured forms and
scanned documents, supporting safer and faster ingestion into registries and reporting
systems [4].
The operational rationale for automation is grounded in well-documented burdens
associated with EHR work and clerical tasks. Time–motion and workflow studies show
that a large share of clinician effort is consumed by documentation and desk work, while
EHR-related clerical load contributes to burnout and reduced job satisfaction [
5
,
11
]. Re-
cent approaches therefore combine intelligent document understanding with orchestration
layers that can navigate legacy web or desktop interfaces. In practice, this is achieved
through agent-based systems and intelligent automation frameworks, which plan multi-
step tasks, validate extracted content, and interact with multiple applications under policy
constraints [
6
]. As organizations scale such solutions, attention to secure health information
exchange and interoperability standards remains essential so that automation improves
throughput without compromising privacy or cybersecurity [
2
]. Literature on the inte-
gration of AI with robotic process automation also indicates growing maturity in using
these tools to reduce administrative friction while keeping governance and sustainability
considerations in view [12].
2.2. Image Processing and Optical Character Recognition in Clinical Contexts
Image processing and optical character recognition (OCR) are foundational technolo-
gies for converting semi-structured and unstructured clinical documents into machine-
readable data, enabling downstream analytics and workflow automation. Over recent years,
deep learning has substantially improved resilience to noise, variability in layouts, and
the diverse fonts and formats often found in healthcare documentation such as laboratory
reports, dialysis treatment logs, and admission forms. Advances in convolutional neural
networks, recurrent architectures, and more recently transformer-based models have en-
hanced text detection and recognition accuracy in challenging environments. Layout-aware
frameworks that encode both spatial and textual information further improve field-level
data extraction, which is particularly useful in nephrology, where recurring treatment forms
and frequent laboratory reports require consistent and accurate transcription [1316].
Clinical implementations increasingly integrate preprocessing, OCR, and post-
processing validation with domain-specific rules or natural language processing to ensure
accuracy and shorten turnaround times for data entry. Tailored OCR systems have been
Technologies 2025,13, 530 4 of 24
shown to outperform manual transcription in terms of speed and reliability, especially for
vital-signs documentation and prescription forms. Moreover, pipelines that combine OCR
with natural language processing have demonstrated high levels of precision in extracting
usable information from scanned health records, enabling accurate population of registries
and quality measurement databases. Multi-center evaluations in intensive care settings
have further shown that OCR-based data entry can accelerate information flows and re-
duce staff burden, provided that layout variations and device heterogeneity are managed
through preprocessing and human-in-the-loop validation. Collectively, these findings
underscore the potential of healthcare-specific OCR pipelines to improve the efficiency,
safety, and reliability of information management in nephrology and other chronic disease
domains [4,7,17].
2.3. Agent-Based Systems for Workflow Automation
Agent-based systems provide a principled foundation for automating complex, multi-
step workflows by encapsulating autonomy, social ability, reactivity, and proactivity in
software entities that can perceive their environment, plan actions, and collaborate to
achieve organizational goals. In administrative healthcare contexts, agents can coordinate
extraction, validation, and submission tasks across heterogeneous applications while re-
specting local policies, role-based permissions, and exception handling. This aligns with
the evolution of robotic process automation from scripted UI macros toward intelligent
orchestration layers capable of decision-making and resilience to variability in interfaces
and data [
18
,
19
]. Classic agent research established the architectural and coordination
principles that enable such capabilities, including task decomposition, negotiation, and
cooperative problem solving, which remain directly relevant when simulating human
clerks that must navigate legacy web portals and desktop systems [
20
,
21
]. Recent literature
further argues for integrating agent reasoning with analytics and document understand-
ing so that automation not only executes keystrokes but also validates content, detects
anomalies, and triggers human review when confidence is low [12].
Building on these foundations, contemporary “agentic AI” systems extend workflow
automation with tool use, planning, and self-monitoring, enabling agents to call OCR
and NLP services, enforce domain rules, and maintain auditable trails under governance
constraints. In healthcare, this makes it possible to operationalize human-in-the-loop
patterns where agents handle routine steps and escalate ambiguous cases to clinicians or
administrators, thereby reducing turnaround time without sacrificing safety. Evidence
from biomedicine demonstrates that agentic approaches can structure complex, multi-
application tasks and coordinate specialized tools, suggesting strong applicability to clerical
data flows in nephrology [
6
]. At scale, however, secure health-information exchange and
interoperability remain prerequisites; automated agents must comply with cybersecurity
controls and data-sharing policies so that throughput gains do not introduce privacy
or integrity risks [
2
]. Taken together, the literature supports a layered design in which
agent-based orchestration governs document AI pipelines, integrates with existing portals,
and embeds oversight and auditing an approach well-suited to automating repetitive,
rule-bound reporting workflows in dialysis centers.
In the present study, these agent-based principles are operationalized in a fully im-
plemented prototype integrated with the NephroM platform. The proposed virtual-clerk
model adopts a three-layer architecture document ingestion and recognition, data valida-
tion and adaptive reasoning, and task-execution automation all of which were developed
and deployed within the working system. The agent-based automation layer, built on Play-
wright and PyWinAuto, enables the virtual clerk to automatically submit verified records to
external government portals while maintaining compliance and auditability. Accordingly,
Technologies 2025,13, 530 5 of 24
the agent-based framework presented here represents not only a guiding architecture but
also an implemented orchestration system that coordinates OCR, validation, and automa-
tion modules in real-world clinical workflows. This implementation demonstrates the
feasibility of applying agent-based AI to healthcare administration, bridging conceptual
design with practical deployment in nephrology documentation.
2.4. Digital Health Transformation and Cybersecurity Constraints
Digital health transformation is frequently framed as an API-first modernization of
clinical systems in which interoperability standards such as HL7 FHIR and application
frameworks like SMART on FHIR enable secure, modular exchange of health information.
In principle, this architecture allows external applications to retrieve and submit data in a
governable manner while preserving auditability, consent management, and least-privilege
access. In practice, however, many health authorities and public agencies remain reluctant
to expose data-ingest interfaces because operational and legal risks around cybersecurity,
privacy, and data misuse are perceived to outweigh the efficiency gains. The result is a
persistent gap between the promise of interoperable, standards-based exchange and the
reality of policy-constrained ecosystems that continue to rely on manual re-entry into legacy
web portals and desktop applications. Prior work highlights both the technical feasibility of
safe, standards-conformant exchange and the governance challenges that limit routine cross-
organizational sharing, underscoring the need for solutions that respect existing controls
while reducing clerical burden. Representative analyses of secure health-information
exchange and interoperable app ecosystems emphasize that successful adoption hinges on
end-to-end security controls, identity management, and rigorous auditing capabilities that
must be demonstrated to regulators before broader API access is permitted [2,22].
Concurrently, the healthcare threat landscape has intensified, with systematic reviews
documenting escalating risks from ransomware, phishing, credential compromise, and
exploitation of third-party components. These incidents have real operational consequences,
including care delays and data integrity concerns, which further discourage authorities
from opening inbound programmatic channels without robust mitigations [
23
]. To balance
transformation with risk, emerging approaches combine privacy-preserving computation
and verifiable infrastructure such as blockchain-backed audit trails for provenance and
federated learning to keep raw patient data local while enabling shared model improvement.
Although these technologies do not eliminate risk, they provide concrete mechanisms to
strengthen auditability, reduce data movement, and demonstrate compliance, thereby
making controlled automation more acceptable to oversight bodies [
24
,
25
]. Within such
policy and security constraints, agent-based automation that operates through sanctioned
user interfaces paired with document-AI pipelines and human-in-the-loop review offers a
pragmatic path to efficiency. It can preserve existing governance boundaries while reducing
redundant data entry and improving timeliness in high-frequency domains like nephrology
information management.
Despite the rapid progress of AI in healthcare data management, several research
gaps remain unresolved. Existing studies on electronic health record automation and
OCR pipelines largely focus on general clinical documentation or radiology reports, with
limited exploration of high-frequency, high-volume specialties such as nephrology, where
repeated dialysis sessions generate a significant clerical burden. While robotic process
automation and agent-based approaches have been applied in finance and business process
management, their integration with healthcare-specific image processing pipelines is still
underdeveloped. Moreover, most implementations emphasize technical accuracy without
sufficiently addressing the policy and cybersecurity constraints that prevent the use of open
APIs in government health systems. This leaves a critical gap for research that demonstrates
Technologies 2025,13, 530 6 of 24
how intelligent, agent-based automation can operate effectively within restrictive security
environments, reduce redundant manual data entry, and improve timeliness in nephrology
information flows while ensuring compliance with privacy and regulatory requirements.
3. Methodology
3.1. System Architecture
The intelligent virtual clerk was developed as an add-on module to the NephroM
system, an enterprise resource platform (ERP) used for managing dialysis operations and
patient data. Rather than replacing existing infrastructure, the module extends NephroM’s
capabilities by introducing an agent-based automation layer that interfaces with external
government portals such as the National Health Security Office (NHSO) and the Health
Service Information Office. This integration enables automated data exchange while main-
taining compliance with security and interoperability requirements. The intelligent virtual
clerk is organized into a three-layer system architecture that was fully implemented in
the working prototype, as illustrated in Figure 1. The first layer, Document Ingestion and
Recognition, processes inputs from scanned forms, dialysis logs, and laboratory reports
through preprocessing methods such as noise reduction, normalization, and segmentation,
followed by optical character recognition using layout-aware models to generate structured
data. The second layer, Validation and Domain Rules, ensures that extracted values are
clinically plausible and consistent with administrative standards by applying predefined
rules and allowing human-in-the-loop verification for ambiguous cases. The third layer,
Agent-Based Automation, was also developed and deployed within the prototype. It
functions as the operational core of the virtual clerk, where intelligent agents automatically
interact with government platforms, handle exceptions, and ensure compliance through
automated logging and monitoring. Functional testing confirmed that the automation
agents can execute submission and verification tasks reliably across both web-based and
desktop systems.
Figure 1. Three-layer architecture of the intelligent virtual clerk showing implemented OCR, valida-
tion, and automation modules.
In order to provide project-specific details, the technical implementation of this archi-
tecture is further illustrated in Figure 2, which maps the tools and APIs applied to each
system layer. In the input stage, scanned forms are processed using OpenCV and Tesseract
OCR, while structured digital entries are collected directly from user inputs. In Layer 2, Py-
dantic is employed to enforce administrative and clinical validation rules, while ChatGPT
(GPT-4 API, 1 March 2025) supports anomaly detection and explanatory reasoning, com-
plemented by human verification where necessary. In Layer 3, the automation framework
integrates Playwright for web-based workflows and PyWinAuto for PC-based government
Technologies 2025,13, 530 7 of 24
systems. Both were configured and tested to perform automated data entry, submission,
and report generation, ensuring adaptability and compliance across heterogeneous in-
tegration environments. This implementation demonstrates the practical application of
the virtual clerk in a real-world project, emphasizing its modularity and showing how
open-source libraries, LLM reasoning, and agent-based automation can be integrated into
a workflow that directly addresses the lack of interoperable APIs in government health
information systems.
Figure 2. Technical Implementation Architecture of the Virtual Clerk.
3.2. Agent-Based AI Design
The proposed virtual clerk is designed and implemented as an agent-based AI system
that follows a continuous cycle of Perception, Decision, Action, and Monitoring, as illus-
trated in Figure 3. All four stages were developed within the working prototype, enabling
end-to-end automation of data recognition, validation, and submission. In the perception
stage, the agent acquires data from two main sources: scanned clinical forms processed by
OpenCV and Tesseract OCR, and structured data directly entered by users. These inputs
are transformed into structured representations such as JSON, often accompanied by confi-
dence scores that indicate recognition accuracy. The decision stage combines deterministic
validation with adaptive intelligence. Rule-based checks implemented through Pydantic
enforce administrative and clinical constraints, while LLM reasoning (ChatGPT) supports
anomaly detection, normalization of ambiguous inputs, and generation of explanatory
feedback. Specifically, the language-model component was implemented using the OpenAI
GPT-4 API (version released on 1 March 2025) under the gpt-4-turbo configuration, se-
lected for its contextual reasoning capability and robust handling of clinical text validation
tasks. Each request had an average latency of approximately 1.6 s per query, which was
considered acceptable for near-real-time operations in clinical data entry.
The agent’s inferential capability resides within this decision stage, which constitutes
the reasoning layer of the architecture. In this layer, symbolic (rule-based) inference and
statistical (LLM-based) inference operate in concert. The rule engine applies 84 determinis-
tic constraints covering field types, numeric ranges, logical dependencies, and temporal
consistency. When these rules are violated or OCR confidence falls below 0.85, an empiri-
cally selected operating point identified during pilot evaluations where confidence values
below 0.85 frequently correlated with character-level ambiguities and schema violations, a
constrained GPT-4 reasoning routine is invoked to propose context-consistent corrections
while strictly prohibiting data fabrication. Each candidate output is then re-validated
against the deterministic schema before being accepted. This rule
bounded reasoning
human escalation policy operationalizes bounded rationality, allowing the agent to
optimize accuracy and compliance under uncertainty while maintaining reactive efficiency
for deterministic cases.
Technologies 2025,13, 530 8 of 24
Figure 3. Agent cycle design of the virtual clerk.
In terms of technical rule design, the 84 deterministic rules are organized into five
categories: (1) type and format rules (e.g., enforcing fixed-length alphanumeric HN identi-
fiers); (2) numeric range rules (e.g., validating that Creatinine lies within 0.3–20.0 mg/dL);
(3) unit and normalization rules (e.g., converting Urea from mmol/L to mg/dL when nec-
essary); (4) cross-field dependency rules (e.g., ensuring PreWeight > DryWeight and Post-
Weight < PreWeight); and (5) temporal consistency rules (e.g., requiring SpecimenDate
Re-
portDate). Representative examples include: R12—HGB must be within
6.0–20.0 g/dL
;
R23—flag BUN values inconsistent with Creatinine-based physiological ratios; R41—reject
records where laboratory timestamps exceed session timestamps; and R73—escalate potas-
sium levels above 7.0 mmol/L for human review. These rule categories constitute the
system’s explicit knowledge base, providing the deterministic inference layer that inter-
faces with the constrained LLM reasoning routine.
To make explicit how these rules formalize domain knowledge, the virtual clerk
follows the three canonical components of an expert system. First, the 84 deterministic rules
form the knowledge base, encoding clinical, administrative, physiological, and temporal
constraints required in dialysis documentation. Second, the inference engine consists of
(a) a deterministic rule evaluator that performs deductive checks, (b) a bounded LLM
reasoning module invoked only under uncertainty, and (c) a deterministic re-validation
layer that enforces schema compliance before accepting any output. This hybrid mechanism
ensures that deductive logic remains primary while uncertainty is tightly controlled. Third,
the user interface layer is implemented within the NephroM platform, providing OCR
upload interfaces, real-time validation feedback, and human-in-the-loop review pathways.
The action stage enables the agent to simulate the role of a clerk by automatically
filling government forms and submitting records through existing platforms. Web-based
portals are handled with Playwright, while legacy PC applications are managed with Py-
WinAuto, ensuring flexibility across heterogeneous infrastructures. Finally, the monitoring
stage incorporates human-in-the-loop validation, systematic audit logging, and reporting,
which provide transparency, error recovery, and compliance with security requirements.
Collectively, these four stages form a closed-loop cycle that allows the agent to perceive its
environment, reason over data, act autonomously, and adapt based on feedback.
When compared with conventional robotic process automation (RPA), the agent-based
AI design offers significant advantages. RPA workflows are typically brittle, failing when
user interfaces change or when unexpected data is encountered. In contrast, intelligent
agents embody autonomy, adaptability, and reasoning [
20
]. By combining rule-based
validation with LLM reasoning, the virtual clerk does more than execute static scripts:
Technologies 2025,13, 530 9 of 24
it proactively identifies anomalies, explains discrepancies, and collaborates with human
reviewers when required. The monitoring layer reinforces accountability through audit
trails and continuous feedback, moving beyond linear automation pipelines. This im-
plementation demonstrates the four canonical properties of intelligent agents autonomy,
reactivity, proactivity, and social ability within an operational prototype that reduces clerical
burden, improves data accuracy, and enhances trustworthiness compared with traditional
automation approaches.
3.3. Image Processing Pipeline
This section describes the image-processing pipeline that converts printed paper forms
and scanned dialysis logs into machine-readable data ready for validation and automation.
An overview of the pipeline is shown in Figure 4. Preprocessing begins with denoising, de-
skewing, and contrast normalization, followed by binarization to improve text–background
separation. Global thresholding [
26
] and adaptive binarization methods [
27
] are applied
depending on illumination and paper artifacts, with morphological operations used to
repair broken strokes and suppress speckle noise. Text regions are localized and recog-
nized using the Tesseract OCR engine, which employs adaptive classifiers and language
models to support robust character recognition in printed clinical forms [
28
]. For semi-
structured layouts such as tables, labels, and key–value zones, the pipeline aligns OCR
results with layout-aware models to preserve spatial relationships, thereby enabling reliable
field mapping across variable templates [13,15].
Figure 4. Overview of the image processing pipeline for input and output.
Post-OCR processing further structures and quality-controls the extracted text. Rule-
based parsing standardizes identifiers, dates, and units, while confidence scores and
heuristics trigger reprocessing or human review in ambiguous cases. Normalized fields
are serialized into JSON and passed to the validation module, where schema checks and
range constraints are applied. Such OCR-to-validation pipelines have been shown to
reduce turnaround times and improve registry data usability [
29
]. As manual data entry is
known to introduce errors, the pipeline is specifically designed to minimize keystrokes and
surface only exceptions, aligning with evidence that optimized data processing methods
can substantially lower error rates in clinical research [
3
]. The final structured output with
confidence scores, provenance, and audit artifacts feeds the agent’s decision stage and
ultimately the automation layer for secure submission to government systems.
3.4. OCR Configurations and Technical Integration
The intelligent virtual clerk employs a dual-configuration OCR pipeline engineered
for high-fidelity extraction and validation of semi-structured clinical records. The baseline
deterministic configuration utilizes Tesseract v5.3.2 in legacy (non-LSTM) mode, executing
rule-driven segmentation and glyph-pattern correlation for character decoding. Input
frames are pre-conditioned through OpenCV-based Gaussian denoising, Hough-transform
de-skewing, and adaptive binarization, followed by morphological opening/closing to
Technologies 2025,13, 530 10 of 24
reconstruct stroke continuity and eliminate impulse noise. Post-processing modules imple-
ment deterministic normalization routines that apply regular-expression filters to enforce
field syntax (patient identifiers, timestamps, measurement units) and to correct recur-
rent optical ambiguities (e.g., “O
0”, “I
1”). The structured output is mapped to the
NephroM schema, which defines 84 deterministic constraints covering data types, numeric
bounds, and inter-field dependencies, providing a transparent baseline for audit-compliant
data ingestion.
The enhanced configuration activates the Long Short-Term Memory (LSTM) recog-
nition engine of Tesseract v5.3.2, enabling contextual sequence modeling across character
windows. Pre-trained English–Thai models are extended with a domain-specific lexical
set incorporating nephrology terminology such as dialysate, hemodiafiltration, and Kt/V.
Tokens with confidence scores below 0.85 invoke a generative reasoning sub-module im-
plemented via the OpenAI GPT-4 Turbo API. The model operates under a constrained,
instruction-based validation prompt that enforces bounded semantic behavior and prohibits
uncontrolled text generation, for example:
System Role: You are an AI-based clinical data validator operating within a rule-
constrained data entry system.
Your task is to analyze structured OCR outputs from nephrology forms, identify anomalies,
and propose corrections
only when they are derivable from contextual or domain-consistent evidence.
Instructions:
1. Input will be provided as a JSON object containing {field_name, value, confidence,
data_type, rule_reference}.
2. For each record:
- Verify that the value conforms to expected type, unit, and range constraints (as indicated
by rule_reference).
- If confidence < 0.85 or rule violation is detected:
a. Analyze related fields for contextual inference (e.g., Pre_Weight vs. Post_Weight, Urea
vs. Creatinine).
b. If a correction is logically deducible, output the revised value and reasoning note.
c. If ambiguity remains, flag for human review.
3. NEVER fabricate or infer data outside the observed record set.
4. Return all outputs in strict JSON format:
{
“field_name”: “,
“original_value”: “,
“suggested_value”: “,
“confidence”: “,
“status”: “validated|corrected|flagged”,
“reason”:
}
All inference transactions are encapsulated with execution metadata—token counts, la-
tency, and probabilistic confidence metrics—to support deterministic replay. The post-LLM
output undergoes Pydantic schema validation, applying rule-based range checks, relational
Technologies 2025,13, 530 11 of 24
logic (e.g., dry-weight < post-dialysis weight), and temporal-ordering verification before
serialization into JSON with provenance hashes for downstream decision-layer ingestion.
The overall implementation follows a hybrid microservice architecture integrating
a .NET (C#) front-end for visualization and supervisory control with Python-based back-
end services executing OpenCV, Tesseract, and GPT-4 operations through RESTful APIs.
Process automation leverages Playwright for web-form orchestration and PyWinAuto for
legacy desktop interfacing. This composite architecture constitutes a hybrid deterministic–
AI pipeline, where rule-based modules guarantee compliance and reproducibility, and
learning-based components contribute adaptive reasoning and contextual correction. The
resulting system achieves an explainable, auditable, and regulation-conformant automation
framework for nephrology information management. The GPT-4 Turbo model was not
retrained or fine-tuned; rather, it was configured through a constrained instruction schema
and an internal validation wrapper that enforces structured input/output formats, response
length limits, and deterministic key–value alignment.
A key technical characteristic of the proposed architecture lies in its pipeline-level
determinism rather than reliance on the raw OCR engine alone. While the Tesseract legacy
model is algorithmically deterministic, its output can vary when input images differ in
illumination, rotation, or contrast. To address this, the system applies a fixed and re-
peatable normalization sequence grayscale conversion, resolution standardization, global
thresholding, binarization, geometric de-skewing, and noise suppression before every OCR
operation. These steps ensure that input frames are rendered into a stable canonical form,
enabling reproducible OCR behavior across heterogeneous capture conditions. Further-
more, downstream components including the 84-rule deterministic validator, bounded
LLM inference, and deterministic post-validation act as stabilizing layers that correct resid-
ual variabilities and constrain uncertainty. This pipeline-level determinism, combined
with the rule
reasoning
re-validation loop, represents the most technically distinctive
aspect of the architecture and differentiates it from conventional OCR workflows that lack
inferential stabilization or safety-layered correction.
3.5. Evaluation Metrics
3.5.1. Error Detection Rate of Automation Accuracy
We evaluated the system’s capability to detect and correct erroneous data fields
by comparing two configurations: the basic OCR system, which relies solely on deter-
ministic pattern matching of OCR outputs, and the AI-enhanced OCR system, which
integrates advanced AI models for anomaly detection and normalization alongside ba-
sic rule-based checks, and also suggests the correct value whenever possible. The pri-
mary endpoint was the Error Detection Rate (EDR), equivalent to recall on the error class,
calculated as
Recall =TP
TP+FN
, where
TP
represents the number of correctly detected
erroneous fields and
FN
represents the number of undetected errors. To account for over-
flagging,
Precision =TP
TP+FP
and
F
1
score =
2
PrecisionRecall
Precision+Recall
, the harmonic mean of
precision and recall [
30
], were also reported. The evaluation was performed across three
common error categories: Missing Values, Out-of-Range, and Typo/Free-text, which reflect
typical clinical data entry problems [
31
]. Performance metrics were visualized through
grouped bar charts comparing precision, recall, and F1-score for both systems, as well as
receiver operating characteristic (ROC) curves [
32
] to illustrate overall detection perfor-
mance, with the area under the curve (AUC) calculated for each system. The evaluation
aimed not only to compare the raw recognition performance between the basic OCR and
the AI-enhanced OCR systems but also to validate how improved text accuracy supports
the agent-based automation layer in achieving reliable data submission. This connection
Technologies 2025,13, 530 12 of 24
between recognition accuracy and automation reliability forms a key validation step for
the virtual clerk architecture.
3.5.2. Time Efficiency
To evaluate operational performance, we measured the processing speed and time
efficiency of both systems using two key metrics. The first was Average Time per Record,
defined as the mean time required to complete the processing of a single record from initial
capture to final confirmation. The second was Total Completion Time, which measured the
total elapsed time required to process all records in a batch. Identical tasks were executed
under controlled conditions using both the basic OCR system and the AI-enhanced OCR
system, and the resulting mean values were recorded and compared. Time reduction was
expressed as both absolute time saved and percentage improvement. Each participant
processed dialysis reports under both configurations (basic OCR and AI-enhanced OCR)
in a counterbalanced order, and the system automatically recorded timestamps for start
and completion events to ensure objective measurement. LLM latency was computed
from server-side timestamps (request dispatch to response receive), and per-record cost
was derived from API token-usage logs averaged across the test set. This evaluation
was conducted by clinical staff in a real-world setting to ensure that the recorded times
accurately reflected the system’s practical performance.
3.5.3. System Usability Evaluation
User acceptance and usability were assessed using the System Usability Scale
(SUS) [
33
], a standardized instrument consisting of ten items with alternating positive
and negative statements rated on a five-point Likert scale. Each participant used both the
basic OCR system and the AI-enhanced OCR system and completed the SUS questionnaire
for each configuration after task completion. For each configuration, the mean SUS score
and standard deviation were calculated on a 0–100 scale, where higher scores indicate
better usability. These scores were interpreted against conventional benchmarks [
34
], with
scores below 50 categorized as Not Acceptable, scores between 50 and 70 as Marginal, and
scores above 70 as Acceptable, including the sub-ranges of Good (70–80) and Excellent
(>80). The results were visualized using an acceptability scale diagram to highlight the
usability difference between the two systems.
3.6. Experimental Design and Workflow
Figure 5presents the overall experimental design and workflow of the proposed
AI-enhanced OCR system. The process begins with data collection and preprocessing of
clinical laboratory documents obtained from dialysis centers. The baseline OCR system
is first applied to extract textual content, followed by the AI-enhanced OCR process that
integrates rule-based validation and a LLM reasoning component to detect anomalies and
correct recognition errors. The processed data are then evaluated using precision, recall, and
F1-score metrics, while user feedback is collected to assess workflow efficiency and usability.
A human-in-the-loop validation step is included at the final stage to ensure the integrity
of the corrected data before reporting. This experimental design provides a structured
framework for comparing both the baseline OCR and AI-enhanced OCR configurations
and for validating their effectiveness in real-world clinical documentation tasks.
Technologies 2025,13, 530 13 of 24
Figure 5. Experimental design and evaluation workflow of the AI-enhanced OCR system.
3.7. Data Sources
The data for testing the virtual clerk were obtained from real-world clinical environ-
ments, specifically a private dialysis center, where routine patient care and administrative
documentation are performed. In this study, most inputs consisted of printed laboratory
reports containing key biochemical parameters transferred from other clinical laboratories,
with the main challenge at the dialysis clinic being the need to manually enter these data
into the system, as shown in Figure 6. The evaluation of the virtual clerk system was
conducted using a dataset consisting of 65 documents of a single type, specifically printed
laboratory reports containing computer-generated printed text only, with no handwritten
entries or signatures. Each document included approximately 35 data fields representing
key clinical and administrative information. The data entry tasks were performed by ten
specialized dialysis nurses (five using the basic OCR system and five using the AI-enhanced
OCR system), all of whom routinely handle patient care documentation in a real-world
clinical setting.
To realistically simulate operational conditions, scanned images were supplemented
by webcam captures commonly used in clinical settings. This approach introduced natu-
ral variability in resolution, lighting, and perspective, reflecting how forms are actually
digitized in practice. As a result, the input data exhibited heterogeneous quality: some
documents were clear and properly aligned, whereas others showed skew, shadowing, or
angled views due to handheld captures. Such diversity in data quality was essential for
testing the robustness of the image-processing pipeline, which must reliably normalize
noisy or distorted inputs before feeding them into the validation and automation layers.
Technologies 2025,13, 530 14 of 24
Figure 6. Example of a printed laboratory report used as a data source.
4. Results
4.1. Results of Error Detection Rate of Automation Accuracy
The evaluation was performed on 65 documents, each containing approximately
35 data fields, totaling 2275 fields for error detection analysis. The comparison between the
basic OCR system and the AI-enhanced OCR system demonstrated clear improvements in
error detection performance across all three error categories, as presented in Table 1and
Figure 7. For Missing Values, the AI-enhanced OCR achieved a precision of 0.990, recall of
0.950, and F1-score of 0.969, all higher than those of the basic OCR system (precision 0.968,
recall 0.900, F1-score 0.933). In the Out-of-Range category, the AI-enhanced OCR showed
the greatest improvement with near-perfect recall (0.999) and precision (0.995), yielding an
F1-score of 0.997, compared to the basic OCR system’s precision of 0.951, recall of 0.967,
and F1-score of 0.959. Similarly, for Typo/Free-text errors, the AI-enhanced OCR reached
a precision of 0.990, recall of 0.977, and F1-score of 0.983, outperforming the basic OCR’s
precision of 0.922, recall of 0.950, and F1-score of 0.936. As shown in the grouped bar chart,
the AI-enhanced OCR consistently achieved higher precision, recall, and F1-scores across
all error categories, with the most notable improvement observed in Out-of-Range errors.
These results indicate that the integration of AI with traditional OCR significantly enhances
automation accuracy and reduces manual verification needs, particularly when handling
complex or ambiguous data fields.
Table 1. Precision, recall, and F1-score of the basic OCR and AI-enhanced OCR systems across three
error categories.
Error Category
Detect Method
Precision Recall (EDR) F1-Score
Missing Values OCR only 0.968 0.900 0.933
OCR + AI 0.990 0.950 0.969
Out-of-Range OCR only 0.951 0.967 0.959
OCR + AI 0.995 0.999 0.997
Typo/Free-text OCR only 0.922 0.950 0.936
OCR + AI 0.990 0.977 0.983
Figure 8shows the ROC curves comparing the precision, recall, and F1-score of the
basic OCR system and the AI-enhanced OCR system across three error categories: Missing
Technologies 2025,13, 530 15 of 24
Values, Out-of-Range, and Typo/Free-text. The results show that the AI-enhanced OCR
consistently outperformed the basic OCR system in all three metrics, with the most notable
improvement observed in the Out-of-Range category, where its performance approached
near-perfect accuracy. This demonstrates the effectiveness of integrating AI capabilities
with traditional OCR in improving error detection and correction, leading to more reliable
and accurate data processing.
Figure 7. Grouped bar chart comparing precision, recall, and F1-score for the two systems.
Figure 8. ROC curve comparing error detection performance of basic OCR and AI-enhanced OCR systems.
4.2. Results of Efficiency of Time
The evaluation of time efficiency was conducted using a total of 65 documents, each
containing approximately 35 data fields, processed under both configurations: the basic
OCR system and the AI-enhanced OCR system. As shown in Table 2, the average time per
record for the basic OCR system was 85.2 s, whereas the AI-enhanced OCR system reduced
this to 42.1 s, resulting in a time saving of 43.1 s per record. In terms of total completion
time for all 65 documents, the basic OCR system required 92.3 min, while the AI-enhanced
OCR system completed the task in only 45.6 min, representing a reduction of 46.7 min.
These results demonstrate that the AI-enhanced OCR system nearly doubled the processing
speed, significantly reducing manual effort and improving overall workflow efficiency in
the clinical data entry process.
Technologies 2025,13, 530 16 of 24
Table 2. Comparison of processing times between the basic OCR and AI-enhanced OCR systems.
Metric OCR Only OCR + AI
Average Time Difference
Average Time per Record (sec)
85.2 42.1 43.1
Total Completion Time (min) 92.3 45.6 46.7
4.3. Results of System Usability and Adoption
The usability of the two systems was evaluated using the SUS, which consists of ten
standardized questions rated on a five-point Likert scale. As shown in Table 3and Figure 9,
the AI-enhanced OCR system achieved a higher overall SUS score of 81.0, placing it in
the “Excellent” category, whereas the basic OCR system received a score of 75.0, which
falls within the “Good” range. Across individual questions, the AI-enhanced OCR system
consistently scored slightly higher than the basic OCR system, particularly in areas related
to ease of use (Q3, Q9) and user confidence (Q1). However, both systems showed lower
scores for Q6 and Q7, indicating that users perceived some inconsistency in the system
and recognized the need for improvement in training and learning speed. These results
suggest that while both systems are generally acceptable for clinical use, the integration
of AI significantly enhances user satisfaction and system adoption by reducing perceived
complexity and improving workflow integration.
Table 3. Mean SUS scores for individual questionnaire items comparing the basic OCR and AI-
enhanced OCR systems.
No. Type Questions N OCR Only
Mean (SD)
OCR + AI
Mean (SD)
1 Positive
I think I would like to use the OCR
system/AI-enhanced OCR system regularly for
completing healthcare data entry tasks.
5 4.68 (0.47) 4.74 (0.42)
2 Negative I found the OCR system/AI-enhanced OCR
system unnecessarily complex. 5 1.92 (0.58) 1.64 (0.51)
3 Positive I thought the OCR system/AI-enhanced OCR
system was easy to use. 5 4.28 (0.52) 4.52 (0.46)
4 Negative
I think I would need support from a technical
expert to effectively use the OCR
system/AI-enhanced OCR system.
5 2.08 (0.63) 1.78 (0.55)
5 Positive
I found the functions of the OCR
system/AI-enhanced OCR system to be well
integrated into the existing workflow.
5 4.40 (0.49) 4.46 (0.48)
6 Negative
I thought there was too much inconsistency in the
OCR system/AI-enhanced OCR system. 5 3.18 (0.81) 2.86 (0.77)
7 Positive
I imagine most healthcare staff would learn to use
the OCR system/AI-enhanced OCR system
very quickly.
5 3.58 (0.69) 3.92 (0.66)
8 Negative I found the OCR system/AI-enhanced OCR
system cumbersome to use. 5 1.84 (0.54) 1.66 (0.53)
9 Positive I felt confident using the OCR
system/AI-enhanced OCR system. 5 4.32 (0.51) 4.62 (0.47)
10 Negative I needed to learn many things before I could start
using the OCR system/AI-enhanced OCR system.
5 2.24 (0.60) 1.92 (0.56)
Technologies 2025,13, 530 17 of 24
Figure 9. SUS acceptability scale indicating overall usability levels of the two systems.
5. Discussion
5.1. Summary of Key Findings
The evaluation revealed that the AI-enhanced OCR system significantly improved
its ability to detect and correct errors compared to the basic OCR system, as evidenced
by higher precision, recall, and F1-scores across all three error categories, as shown in
Figures 10 and 11. In the Out-of-Range category, the AI-enhanced OCR effectively ad-
dressed common decimal placement errors, such as when a laboratory value like “20.2” was
misread as “2.02.” This improvement not only enhanced data accuracy but also reduced
the risk of misinterpretation in clinical decision-making. For Typo/Free-text errors, the
AI-enhanced OCR accurately matched hospital numbers (HN) with patient names, even
when names were misspelled or inconsistently recorded, a task at which the basic OCR
system often failed. In the case of Missing Values, the system actively flagged data fields
that were expected to contain information but were left empty, providing notifications to
users to manually verify and input the correct data. These capabilities illustrate how AI
integration enhances both detection accuracy and error resolution, enabling the system
to handle complex, real-world clinical data challenges that basic OCR approaches alone
cannot effectively manage.
Figure 10. Example of the Virtual Clerk detecting and highlighting Typo/Free-text errors and
Missing Values.
Technologies 2025,13, 530 18 of 24
Figure 11. Example of the Virtual Clerk detecting and highlighting Out-of-Range.
Beyond error detection, the AI-enhanced OCR system demonstrated substantial im-
provements in workflow efficiency and user satisfaction. The average time per record was
reduced by nearly half, and total completion time decreased significantly, showing the
potential to accelerate routine clinical documentation tasks. Usability testing using the SUS
indicated that users rated the AI-enhanced system as “Excellent”, compared to the “Good”
rating for the basic OCR system [
34
]. Open-ended feedback highlighted that users valued
the system’s ability to reduce manual verification and improve accuracy, though some areas,
such as consistency and ease of learning, still require enhancement. Together, these findings
demonstrate that integrating AI into OCR systems not only increases technical performance
but also supports more efficient, user-friendly workflows, aligning with previous research
on AI-driven health informatics solutions [31,35].
5.2. Comparison with Previous Studies
The findings of this study are consistent with previous research that highlights the
limitations of traditional OCR systems and the benefits of integrating AI for improving
accuracy in clinical documentation. Prior studies have shown that conventional rule-based
OCR systems are prone to common errors such as decimal misplacements and misinterpre-
tation of numeric values, which can lead to clinically significant inaccuracies
[36,37]
. Our
results demonstrate that by incorporating AI-driven anomaly detection, the system was
able to correct errors automatically, particularly in the Out-of-Range category, reducing
risks to patient safety and improving data quality. Similar improvements were reported
by [
38
], who emphasized the role of machine learning models in identifying and correcting
inconsistent or erroneous health data in electronic health records (EHRs). These outcomes
also align with the broader literature on data quality management in healthcare, which
identifies completeness, accuracy, and consistency as key dimensions for reliable EHR
data [31].
In terms of usability, our findings align with previous studies that have validated the
System Usability Scale (SUS) as a reliable measure of user acceptance in clinical systems.
The AI-enhanced OCR system substantially reduced the manual workload required for data
entry, allowing healthcare staff to spend more time focusing on direct patient care rather
than administrative tasks. Ref. [
34
] established interpretation thresholds for SUS scores,
categorizing them into levels such as Good and Excellent, which guided the interpretation
of our results. Although the AI-enhanced OCR system in our study achieved an Excellent
Technologies 2025,13, 530 19 of 24
usability rating, the handling of healthcare data requires a higher level of reliability due
to its direct impact on patient safety and clinical decision-making. Therefore, even highly
accurate and user-friendly systems must maintain a human-in-the-loop (HITL) process at
the final stage to verify critical information before it is entered into electronic health records.
This approach has been recommended by several researchers as a safeguard to mitigate
residual risks and ensure accountability, particularly when AI systems are deployed in
high-stakes healthcare environments [
39
,
40
]. Similar to prior studies, our results indicate
that while AI automation can significantly reduce manual workload, human oversight
remains essential for final verification to maintain data integrity and protect patient safety.
5.3. Practical Implications for Clinical Workflow
The implementation of the AI-enhanced OCR system has substantial implications for
optimizing clinical workflows, particularly in high-volume settings such as dialysis centers.
By automating routine data entry tasks and intelligently detecting common errors, this sys-
tem significantly reduces the administrative workload placed on nurses and administrative
staff, enabling them to devote more time to direct patient care and clinical decision-making
rather than repetitive clerical work. This mirrors recent findings in generative AI research
showing that large language models (LLMs) embedded within EHRs can improve docu-
mentation quality and reduce editing burden, leading to more efficient note-taking and
summarization workflows [
41
,
42
]. Moreover, by correcting decimal placement errors and
ensuring accurate linkage of hospital numbers with patient names even when names are
misspelled, the system enhances the completeness and accuracy of patient records. Such im-
provements directly address long-standing issues with electronic health record (EHR) data
quality, which is critical for safe and reliable clinical decision-making [
31
]. Early studies on
GenAI-driven clinical documentation also emphasize its potential to make records more
comprehensive and organized, though privacy, bias, and accuracy remain active concerns
that must be continuously managed [
32
]. From a computational perspective, the integra-
tion of the LLM (GPT-4 API, version released on 1 March 2025) introduced an average
response latency of approximately 1.6 s per query, primarily during the anomaly-detection
and normalization steps. This delay was considered acceptable for near-real-time clinical
workflows, as most data-entry tasks occur asynchronously with patient encounters. The
average GPU-equivalent cost of each API call was estimated at $0.002 per record, resulting
in minimal operational expense for batch processing. System-level optimization through
prompt-truncation, caching of frequent templates, and asynchronous request handling
further mitigated latency and ensured that end-to-end throughput remained compatible
with daily dialysis-unit workloads.
Beyond immediate workflow efficiency, the system provides a model for integrating
generative AI into healthcare in a manner that balances automation with safety. Recent
global guidance from the World Health Organization (WHO) highlights that large multi-
modal models must include governance mechanisms and a human-in-the-loop process to
ensure transparency, accountability, and patient safety, particularly in high-stakes environ-
ments [
43
]. Even though the AI-enhanced OCR system in this study demonstrated excellent
usability and substantial reductions in manual workload, final verification by human ex-
perts remains essential to mitigate residual risks and safeguard against potential errors
that may arise from automated processing [
39
,
40
,
44
]. This approach is consistent with
current evidence that AI systems should be viewed as augmenting rather than replacing
human expertise, supporting clinicians by streamlining documentation while maintaining
professional oversight. In the long term, widespread adoption of such systems could
transform healthcare operations by reducing administrative costs, improving regulatory
Technologies 2025,13, 530 20 of 24
compliance, and ultimately allowing clinicians to spend more time on patient-centered
care, while adhering to global standards for ethical AI deployment [45,46].
Beyond usability and governance considerations, the quantitative findings further
highlight the operational impact of the system. The improvements in precision, recall,
and processing time observed in Tables 1and 2directly support the functionality of the
agent-based automation layer (Layer 3). Higher OCR accuracy ensures that the virtual clerk
can perform automated submission with minimal human correction, reducing propagation
of data errors into national reporting platforms. The latency measurements confirm that
the LLM-based reasoning process introduces negligible delay, maintaining near real-time
responsiveness required in clinical documentation workflows. Together, these outcomes
demonstrate that enhanced recognition reliability and operational efficiency are critical
enablers of safe and effective automation, validating the design of the virtual clerk as a
practical agent system rather than a standalone OCR tool.
Comparatively, previous studies on clinical documentation automation have explored
a range of approaches, including conventional rule-based systems, convolutional neural
networks (CNNs) for image recognition, and transformer-based models for text normal-
ization [
15
,
41
,
42
]. Rule-based frameworks, while transparent and interpretable, often lack
scalability across different form formats and require frequent manual updates when insti-
tutional templates change. Deep-learning methods achieve higher recognition accuracy
but typically demand large annotated datasets and extensive computational resources,
which may not be feasible in smaller healthcare facilities. The agent-based OCR framework
proposed in this study seeks to balance these trade-offs by combining deterministic rule
validation with adaptive LLM reasoning, enabling flexible data interpretation without
retraining for each new layout. Nonetheless, this approach also inherits limitations related
to LLM latency, dependency on API availability, and the absence of domain-specific fine-
tuning [
41
,
42
]. Understanding these comparative strengths and weaknesses helps clarify
the methodological landscape and provides a foundation for selecting suitable approaches
in future research and system deployment.
5.4. Limitations and Future Work
While the results of this study demonstrate the potential of the AI-enhanced OCR
system, several limitations must be acknowledged. The dataset was limited to 65 docu-
ments from a single dialysis center and consisted solely of computer-printed text, excluding
handwritten notes and mixed-content scanned forms. This narrow dataset restricts the
generalizability of the findings, as real-world clinical workflows often involve diverse
document formats and varying image quality. Moreover, the study primarily focused on
error detection and workflow efficiency without assessing the downstream clinical impact
of these improvements, such as whether enhanced data accuracy contributes to better
patient outcomes or operational decision-making. Although the AI-enhanced OCR system
substantially reduced manual workload, a HITL verification step remained necessary to
ensure data integrity and patient safety. The usability evaluation also involved a relatively
small group of participants, limiting the representativeness of user experience findings
across different clinical roles and institutional contexts. In addition, the experiment com-
pared only two configurations, basic OCR and AI-enhanced OCR, without including other
control conditions such as commercial OCR systems, human-only data entry, or hybrid
approaches, which could provide a more comprehensive evaluation. Another limitation
concerns algorithmic optimization; the current system employs general-purpose pretrained
models for image processing and language reasoning without domain-specific fine-tuning,
which may constrain its ability to capture the linguistic and contextual nuances of dialysis
data. Furthermore, the study did not conduct formal sensitivity analyses of key parameters
Technologies 2025,13, 530 21 of 24
such as OCR confidence thresholds or preprocessing settings, and uncertainty handling
was evaluated only qualitatively. Common failure cases (e.g., ambiguous numeric charac-
ters or low-contrast regions) were observed during pilot testing but were not quantified
systematically, representing another limitation of the present evaluation.
Future work should address these limitations through several strategic directions. First,
expanding the dataset to include a larger and more diverse sample from dialysis centers
of different sizes, ownership types, and geographic regions will improve external validity
and ensure robust performance across varied operational contexts. Second, comparative
studies involving multiple control groups such as commercial OCR software or human-only
workflows will help benchmark the relative advantages of AI-assisted approaches. Third,
future versions of the system should focus on customizing and fine-tuning models using
dialysis-specific terminology, parameter ranges, and common error patterns to improve
anomaly detection and contextual correction. Fourth, enhancements to the user interface
should emphasize quick-operation features such as one-click filling, batch confirmation,
and smart auto-completion to further streamline workflow efficiency. Fifth, the design of
charts and illustrations can be optimized for greater intuitiveness and clarity, supporting
real-time decision-making by clinical staff. Finally, large-scale deployment studies should
examine governance mechanisms, explainability features, and privacy safeguards to ensure
the ethical and safe integration of AI-enhanced OCR systems into national healthcare
infrastructures. By addressing these directions, future iterations of the system could evolve
into highly reliable and adaptive tools that improve clinical efficiency, data quality, and
patient-centered care.
6. Conclusions
This study introduced and evaluated an AI-enhanced OCR system designed to im-
prove the accuracy and efficiency of clinical data entry in dialysis care settings. By integrat-
ing advanced anomaly detection and normalization capabilities, the system successfully
addressed common and critical errors, such as decimal placement issues and mismatches
between hospital numbers and patient names, while also flagging missing values for man-
ual review. The evaluation demonstrated that the AI-enhanced OCR system significantly
reduced error rates and processing times compared to a basic OCR system, while achieving
excellent usability ratings among clinical users. These improvements indicate that the sys-
tem can serve as a valuable tool for enhancing data quality and streamlining administrative
workflows, ultimately allowing healthcare professionals to dedicate more time to direct
patient care.
Although the system delivered substantial benefits, human oversight remained a cru-
cial component to ensure patient safety and data integrity. The findings support the concept
of a human-in-the-loop approach, where automation assists with high-volume, repetitive
tasks while final verification remains under professional supervision. Looking ahead, the
expansion of this system to include diverse document types, integration with electronic
health record platforms, and the incorporation of more advanced AI technologies could
further transform clinical documentation practices. By continuing to refine both technical
capabilities and governance frameworks, AI-driven OCR systems have the potential to
become trusted, scalable solutions that not only reduce administrative burden but also
contribute to safer, more efficient, and patient-centered healthcare delivery.
Author Contributions: Conceptualization, S.C. and K.I.; methodology, P.J. and K.I.; software, P.W.;
validation, P.W. and K.I.; formal analysis, K.P. and K.I.; investigation, P.W.; resources, K.I.; data
curation, K.I.; writing—original draft preparation, P.W. and K.I.; writing—review and editing, K.P.;
visualization, P.W.; supervision, K.I.; project administration, K.P.; funding acquisition, K.I. All authors
have read and agreed to the published version of the manuscript.
Technologies 2025,13, 530 22 of 24
Funding: This research was partially supported by Chiang Mai University and National council of
Thailand (NRCT).
Institutional Review Board Statement: The study was conducted in accordance with the Declaration
of Helsinki and approved by the Institutional Review Board of Committee of Research Ethics, Faculty
of Public Health, Chiang Mai University (ET031/2024).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data presented in this study are available upon request from the
corresponding author due to restrictions. The data are not publicly available.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AI Artificial Intelligence
APIs Application Programming Interfaces
EHR Electronic Health Records
OCR Optical Character Recognition
RPA Robotic Process Automation
SUS System Usability Scale
References
1.
Satirapoj, B.; Tantiyavarong, P.; Thimachai, P.; Chuasuwan, A.; Lumpaopong, A.; Kanjanabuch, T.; Ophascharoensuk, V. Thailand
Renal Replacement Therapy Registry 2023: Epidemiological Insights into Dialysis Trends and Challenges. Ther. Apher. Dial. 2025,
29, 721–729. [CrossRef]
2.
Spanakis, E.G.; Sfakianakis, S.; Bonomi, S.; Ciccotelli, C.; Magalini, S.; Sakkalis, V. Emerging and Established Trends to Support
Secure Health Information Exchange. Front. Digit. Health 2021,3, 636082. [CrossRef]
3.
Garza, M.Y.; Williams, T.; Ounpraseuth, S.; Hu, Z.; Lee, J.; Snowden, J.; Walden, A.C.; Simon, A.E.; Devlin, L.A.; Young, L.W.; et al.
Error Rates of Data Processing Methods in Clinical Research: A Systematic Review and Meta-Analysis of Manuscripts Identified
through PubMed. Int. J. Med. Inform. 2025,195, 105749. [CrossRef]
4.
Zhou, X.; Zeng, T.; Zhang, Y.; Liao, Y.; Smith, J.; Zhang, L.; Wang, C.; Li, Q.; Wu, D.; Chong, Y.; et al. Automated Data Collection
Tool for Real-World Cohort Studies of Chronic Hepatitis B: Leveraging OCR and NLP Technologies for Improved Efficiency. New
Microbes New Infect. 2024,62, 101469. [CrossRef]
5.
Budd, J. Burnout Related to Electronic Health Record Use in Primary Care. J. Prim. Care Community Health 2023,14,
21501319231166921. [CrossRef]
6.
Gao, S.; Fang, A.; Huang, Y.; Giunchiglia, V.; Noori, A.; Schwarz, J.R.; Ektefaie, Y.; Kondic, J.; Zitnik, M. Empowering Biomedical
Discovery with AI Agents. Cell 2024,187, 6125–6151. [CrossRef]
7. Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019,380, 1347–1358. [CrossRef] [PubMed]
8. Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018,319, 1317–1318. [CrossRef] [PubMed]
9.
Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic
Health Record (EHR) Analysis. IEEE J. Biomed. Health Inform. 2018,22, 1589–1604. [CrossRef] [PubMed]
10.
Topol, E.J. High-Performance Medicine: The Convergence of Human and Artificial Intelligence. Nat. Med. 2019,25, 44–56.
[CrossRef]
11.
Sinsky, C.; Colligan, L.; Li, L.; Prgomet, M.; Reynolds, S.; Goeders, L.; Westbrook, J.; Tutty, M.; Blike, G. Allocation of Physician
Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann. Intern. Med. 2016,165, 753–760. [CrossRef]
12.
Patrício, L.; Varela, L.; Silveira, Z. Integration of Artificial Intelligence and Robotic Process Automation: Literature Review and
Proposal for a Sustainable Model. Appl. Sci. 2024,14, 9648. [CrossRef]
13.
Wang, X.-F.; He, Z.-H.; Wang, K.; Wang, Y.-F.; Zou, L.; Wu, Z.-Z. A Survey of Text Detection and Recognition Algorithms Based on
Deep Learning Technology. Neurocomputing 2023,556, 126702. [CrossRef]
14.
Liu, Z.; Song, R.; Li, K.; Li, Y. From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing.
Appl. Sci. 2025,15, 9247. [CrossRef]
Technologies 2025,13, 530 23 of 24
15.
Xu, Y.; Li, M.; Cui, L.; Huang, S.; Wei, F.; Zhou, M. LayoutLM: Pre-Training of Text and Layout for Document Image Understanding.
In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July
2020; ACM: New York, NY, USA, 2020.
16.
Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T. Text Recognition in the Wild: A Survey. ACM Comput. Surv. 2022,54, 1–35. [CrossRef]
17.
Nitayavardhana, P.; Liu, K.; Fukaguchi, K.; Fujisawa, M.; Koike, I.; Tominaga, A.; Iwamoto, Y.; Goto, T.; Suen, J.Y.; Fraser, J.F.; et al.
Streamlining Data Recording through Optical Character Recognition: A Prospective Multi-Center Study in Intensive Care Units.
Crit. Care 2025,29, 117. [CrossRef]
18. van der Aalst, W.M.P.; Bichler, M.; Heinzl, A. Robotic Process Automation. Bus. Inf. Syst. Eng. 2018,60, 269–272. [CrossRef]
19.
Syed, R.; Suriadi, S.; Adams, M.; Bandara, W.; Leemans, S.J.J.; Ouyang, C.; ter Hofstede, A.H.M.; van de Weerd, I.; Wynn, M.T.;
Reijers, H.A. Robotic Process Automation: Contemporary Themes and Challenges. Comput. Ind. 2020,115, 103162. [CrossRef]
20.
Jennings, N.R.; Sycara, K.; Wooldridge, M. A Roadmap of Agent Research and Development. Auton. Agent. Multi. Agent. Syst.
1998,1, 7–38. [CrossRef]
21. Maes, P. Agents That Reduce Work and Information Overload. Commun. ACM 1994,37, 30–40. [CrossRef]
22.
Mandel, J.C.; Kreda, D.A.; Mandl, K.D.; Kohane, I.S.; Ramoni, R.B. SMART on FHIR: A Standards-Based, Interoperable Apps
Platform for Electronic Health Records. J. Am. Med. Inform. Assoc. 2016,23, 899–908. [CrossRef]
23.
Kruse, C.S.; Frederick, B.; Jacobson, T.; Monticone, D.K. Cybersecurity in Healthcare: A Systematic Review of Modern Threats
and Trends. Technol. Health Care 2017,25, 1–10. [CrossRef]
24.
Kuo, T.-T.; Kim, H.-E.; Ohno-Machado, L. Blockchain Distributed Ledger Technologies for Biomedical and Health Care Applica-
tions. J. Am. Med. Inform. Assoc. 2017,24, 1211–1220. [CrossRef]
25.
Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al.
Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations without Sharing Patient Data. Sci. Rep. 2020,10,
12598. [CrossRef] [PubMed]
26. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979,9, 62–66. [CrossRef]
27. Sauvola, J.; Pietikäinen, M. Adaptive Document Image Binarization. Pattern Recognit. 2000,33, 225–236. [CrossRef]
28.
Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis
and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; IEEE: Piscataway, NJ, USA, 2007; Volume 2.
29.
Hsu, E.; Malagaris, I.; Kuo, Y.-F.; Sultana, R.; Roberts, K. Deep Learning-Based NLP Data Pipeline for EHR-Scanned Document
Information Extraction. JAMIA Open 2022,5, ooac045. [CrossRef]
30.
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv
2020. [CrossRef]
31.
Weiskopf, N.G.; Weng, C. Methods and Dimensions of Electronic Health Record Data Quality Assessment: Enabling Reuse for
Clinical Research. J. Am. Med. Inform. Assoc. 2013,20, 144–151. [CrossRef] [PubMed]
32. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006,27, 861–874. [CrossRef]
33. Brooke, J. SUS—A Quick and Dirty Usability Scale. Ahrq.gov. Available online:
https://digital.ahrq.gov/sites/default/files/docs/survey/systemusabilityscale%2528sus%2529
_
comp%255B1%255D.pdf
(accessed on 26 September 2025).
34. Bangor, A.; Kortum, P.; Miller, J. Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale. J. Usability
stud. 2009,4, 114–123.
35. Lewis, J.R. The System Usability Scale: Past, Present, and Future. Int. J. Hum. Comput. Interact. 2018,34, 577–590. [CrossRef]
36.
Wu, Y.; Dalianis, H.; Velupillai, S. Errors in Clinical Text Processing and Their Impact on Decision-Making: A Review. Artif. Intell.
Med. 2020,104, 101833.
37.
Nguyen, P.A.; Shim, J.S.; Ho, T.B.; Li, W. Machine Learning-Based Approaches for Clinical Text Error Detection: A Systematic
Review. J. Biomed. Inform. 2022,127, 104018.
38.
Luo, Y.; Thompson, W.K.; Herr, T.M.; Zeng, Z.; Berendsen, M.A.; Jonnalagadda, S.R.; Carson, M.B.; Starren, J. Natural Language
Processing for EHR-Based Pharmacovigilance: A Structured Review. Drug Saf. 2017,40, 1075–1089. [CrossRef]
39.
Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.; Precise4Q consortium. Explainability for Artificial Intelligence in
Healthcare: A Multidisciplinary Perspective. BMC Med. Inform. Decis. Mak. 2020,20, 310. [CrossRef] [PubMed]
40.
Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key Challenges for Delivering Clinical Impact with
Artificial Intelligence. BMC Med. 2019,17, 195. [CrossRef]
41.
Small, W.R.; Wang, L.; Horng, S. EHR-Embedded Large Language Models for Hospital-Course Summarization. JAMA Netw.
Open 2025,8, e250112. [CrossRef]
42.
Kernberg, A.; Gold, J.A.; Mohan, V. Using ChatGPT-4 to Create Structured Medical Notes from Audio Recordings of Physician-
Patient Encounters: Comparative Study. J. Med. Internet Res. 2024,26, e54419. [CrossRef]
43.
World Health Organization. Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multi-Modal Models.
Who.int. Available online: https://www.who.int/publications/i/item/9789240084759 (accessed on 26 September 2025).
Technologies 2025,13, 530 24 of 24
44.
Howell, M.D. Generative Artificial Intelligence, Patient Safety and Healthcare Quality: A Review. BMJ Qual. Saf. 2024,33, 748–754.
[CrossRef] [PubMed]
45.
Reddy, S. Generative AI in Healthcare: An Implementation Science Informed Translational Path on Application, Integration and
Governance. Implement. Sci. 2024,19, 27. [CrossRef] [PubMed]
46. Bakken, S. AI in Health: Keeping the Human in the Loop. J. Am. Med. Inform. Assoc. 2023,30, 1225–1226. [CrossRef] [PubMed]
Disclaimer/Publishers Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.