Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF Free Download

Name: Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF
Author: manufacturing_library

1 / 24

0 views•24 pages

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF Free Download

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF free Download. Think more deeply and widely.

Academic Editors: Chihyu Hsu and

Shuo-Tsung Chen

Received: 26 September 2025

Revised: 14 November 2025

Accepted: 15 November 2025

Published: 17 November 2025

Citation: Worragin, P.;

Chernbumroong, S.; Puritat, K.;

Julrode, P.; Intawong, K. Towards

Intelligent Virtual Clerks: AI-Driven

Automation for Clinical Data Entry in

Dialysis Care. Technologies 2025,13,

530. https://doi.org/10.3390/

technologies13110530

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (

https://creativecommons.org/licens

es/by/4.0/).

Article

Towards Intelligent Virtual Clerks: AI-Driven Automation for

Clinical Data Entry in Dialysis Care

Perasuk Worragin 1, Suepphong Chernbumroong 1, Kitti Puritat 2, Phichete Julrode 2,*

and Kannikar Intawong 3,*

1College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;

perasuk_w@cmu.ac.th (P.W.); suepphong.c@cmu.ac.th (S.C.)

2Department of Library and Information Science, Faculty of Humanities, Chiang Mai University,

Chiang Mai 50200, Thailand; kitti.p@cmu.ac.th

3Faculty of Public Health, Chiang Mai University, Chiang Mai 50200, Thailand

*Correspondence: phichete.j@cmu.ac.th (P.J.); kannikar.i@cmu.ac.th (K.I.)

Abstract

Manual data entry in dialysis centers is time-consuming, error-prone, and increases the

administrative burden on healthcare professionals. Traditional optical character recognition

(OCR) systems partially automate this process but lack the ability to handle complex data

anomalies and ensure reliable clinical documentation. This study presents the design and

evaluation of an AI-enhanced OCR system that integrates advanced image processing,

rule-based validation, and large language model-driven anomaly detection to improve

data accuracy, workﬂow efﬁciency, and user experience. A total of 65 laboratory reports,

each containing approximately 35 ﬁelds, were processed and compared under two conﬁgu-

rations: a basic OCR system and the AI-enhanced OCR system. System performance was

evaluated using three key metrics: error detection accuracy across three error categories

(Missing Values, Out-of-Range, and Typo/Free-text), workﬂow efﬁciency measured by

average processing time per record and total completion time, and user acceptance mea-

sured using the System Usability Scale (SUS). The AI-enhanced OCR system outperformed

the basic OCR system in all metrics, particularly in detecting and correcting Out-of-Range

errors, such as decimal placement issues, achieving near-perfect precision and recall. It

reduced the average processing time per record by almost 50% (85.2 to 42.1 s) and improved

usability, scoring 81.0 (Excellent) compared to 75.0 (Good). These results demonstrate the

potential of AI-driven OCR to reduce clerical workload, improve healthcare data quality,

and streamline clinical workﬂows, while maintaining a human-in-the-loop veriﬁcation

process to ensure patient safety and data integrity.

Keywords: AI-enhanced OCR; clinical data entry automation; generative AI; human-in-

the-loop; electronic health records

1. Introduction

In Thailand, nephrology centers play a critical role in providing life-sustaining

hemodialysis care to an estimated 80,000 patients with end-stage kidney disease across

more than 2500 units nationwide. Despite the essential role of these facilities, the exchange

of data between dialysis centers and government agencies remains limited. National au-

thorities have avoided implementing open application programming interfaces (APIs) for

transmitting health records because of concerns about cybersecurity and data privacy. This

has forced healthcare providers to rely on repetitive, manual data entry, often rekeying the

Technologies 2025,13, 530 https://doi.org/10.3390/technologies13110530

Technologies 2025,13, 530 2 of 24

same information into multiple platforms maintained by different government agencies.

Nurses and administrative staff must frequently log into web-based portals and desktop ap-

plications to submit treatment data, claims, and patient outcomes. The duplication of effort

is not only inefﬁcient but also increases the likelihood of errors, delays, and inconsistencies

in reporting [1,2].

The absence of secure and interoperable mechanisms for data exchange has produced

a series of operational challenges in dialysis centers. Surveys and observational studies

suggest that nurses may spend 20–50 percent of their time on clerical tasks such as retyping

laboratory results and treatment notes into government systems. This workload diverts

attention from patient care and increases the risk of fatigue-related mistakes. Clinical

consequences also arise. For instance, when laboratory results are delayed in being entered,

physicians may lack timely information needed for decision-making, forcing patients to

wait longer for follow-up or medication adjustments. International studies have conﬁrmed

that manual transcription is one of the least accurate and most time-consuming methods

of clinical documentation. Systematic reviews show that automation, including optical

character recognition, can substantially reduce error rates compared to manual entry [

Likewise, recent research demonstrates that novel optical character recognition systems

can outperform human operators in terms of speed and reliability in real-world clinical

environments [

]. These ﬁndings highlight the potential for advanced image-processing

techniques to serve as reliable tools in streamlining nephrology information management.

Beyond accuracy and speed, the impact of manual entry on workforce morale has

also drawn increasing attention. Repeated clerical duties contribute to staff burnout and

dissatisfaction, which in turn affect retention in already understaffed healthcare systems [

In nephrology care, where continuity and specialized expertise are vital, losing trained

nurses and technicians because of workload-related fatigue can negatively inﬂuence patient

outcomes. Accordingly, solutions that not only reduce data errors but also alleviate staff

burden can bring systemic beneﬁts, including more sustainable workforce management.

Agent-based artiﬁcial intelligence systems have recently emerged as powerful tools capable

of planning, coordinating, and executing complex workﬂows. These intelligent agents can

simulate the tasks of human clerks by navigating between different applications, validating

extracted data, and interacting with legacy systems. The integration of agent-based AI with

modern image processing therefore offers a promising approach to building “virtual clerks”

that can safely and efﬁciently carry out administrative tasks in healthcare [6].

The present project introduces a fully implemented prototype of an intelligent virtual

clerk developed as an add-on module to the NephroM system, an enterprise resource

platform (ERP) widely used for dialysis data management. Supported by research funding

from the National Research Council of Thailand (NRCT), this project aims to operationalize

intelligent automation within existing clinical workﬂows. The system integrates image-

processing pipelines, rule-based validation, and large-language-model (LLM) reasoning

to automate the capture, veriﬁcation, and secure submission of clinical data to external

government platforms.

The speciﬁc objectives of this study are threefold: (1) to improve the accuracy and efﬁ-

ciency of clinical data entry through AI-enhanced automation; (2) to evaluate the system’s

ability to reduce administrative workload and enhance healthcare workers’ job satisfaction

by minimizing repetitive clerical tasks; and (3) to assess its potential to shorten patient

waiting times by accelerating documentation and submission workﬂows. Accordingly, this

paper focuses on the design, implementation, and evaluation of the OCR and validation

modules as core components of the working prototype. By aligning with national digital-

health strategies while maintaining compliance with cybersecurity policies, the proposed

framework demonstrates how an AI-driven add-on module can enhance interoperability,

Technologies 2025,13, 530 3 of 24

improve service quality, and allow healthcare professionals to dedicate more time to patient

care rather than clerical work. The primary contribution of this work lies in its practical

applicability within real clinical workﬂows rather than in introducing algorithmic novelty.

2. Related Work

2.1. AI in Healthcare Information Management

Artiﬁcial intelligence has become a core enabler of healthcare information management

by improving how data are captured, curated, and used for clinical and administrative

decision-making. At the infrastructure level, AI methods help transform heterogeneous elec-

tronic health record data into machine-actionable formats, support patient representation

learning, and enable predictive analytics that inform quality improvement and popula-

tion management [

–

]. Beyond prediction, AI is increasingly deployed to streamline

routine information workﬂows such as data abstraction, coding, and document classiﬁ-

cation, with the aim of reducing latency and improving completeness and consistency

in health datasets [

]. These capabilities are critical in domains like nephrology, where

high-frequency encounters and laboratory monitoring generate substantial documentation

and reporting requirements. Coupled with modern image processing and optical character

recognition, AI systems can accurately extract key ﬁelds from semi-structured forms and

scanned documents, supporting safer and faster ingestion into registries and reporting

systems [4].

The operational rationale for automation is grounded in well-documented burdens

associated with EHR work and clerical tasks. Time–motion and workﬂow studies show

that a large share of clinician effort is consumed by documentation and desk work, while

EHR-related clerical load contributes to burnout and reduced job satisfaction [

]. Re-

cent approaches therefore combine intelligent document understanding with orchestration

layers that can navigate legacy web or desktop interfaces. In practice, this is achieved

through agent-based systems and intelligent automation frameworks, which plan multi-

step tasks, validate extracted content, and interact with multiple applications under policy

constraints [

]. As organizations scale such solutions, attention to secure health information

exchange and interoperability standards remains essential so that automation improves

throughput without compromising privacy or cybersecurity [

]. Literature on the inte-

gration of AI with robotic process automation also indicates growing maturity in using

these tools to reduce administrative friction while keeping governance and sustainability

considerations in view [12].

2.2. Image Processing and Optical Character Recognition in Clinical Contexts

Image processing and optical character recognition (OCR) are foundational technolo-

gies for converting semi-structured and unstructured clinical documents into machine-

readable data, enabling downstream analytics and workﬂow automation. Over recent years,

deep learning has substantially improved resilience to noise, variability in layouts, and

the diverse fonts and formats often found in healthcare documentation such as laboratory

reports, dialysis treatment logs, and admission forms. Advances in convolutional neural

networks, recurrent architectures, and more recently transformer-based models have en-

hanced text detection and recognition accuracy in challenging environments. Layout-aware

frameworks that encode both spatial and textual information further improve ﬁeld-level

data extraction, which is particularly useful in nephrology, where recurring treatment forms

and frequent laboratory reports require consistent and accurate transcription [13–16].

Clinical implementations increasingly integrate preprocessing, OCR, and post-

processing validation with domain-speciﬁc rules or natural language processing to ensure

accuracy and shorten turnaround times for data entry. Tailored OCR systems have been

Technologies 2025,13, 530 4 of 24

shown to outperform manual transcription in terms of speed and reliability, especially for

vital-signs documentation and prescription forms. Moreover, pipelines that combine OCR

with natural language processing have demonstrated high levels of precision in extracting

usable information from scanned health records, enabling accurate population of registries

and quality measurement databases. Multi-center evaluations in intensive care settings

have further shown that OCR-based data entry can accelerate information ﬂows and re-

duce staff burden, provided that layout variations and device heterogeneity are managed

through preprocessing and human-in-the-loop validation. Collectively, these ﬁndings

underscore the potential of healthcare-speciﬁc OCR pipelines to improve the efﬁciency,

safety, and reliability of information management in nephrology and other chronic disease

domains [4,7,17].

2.3. Agent-Based Systems for Workﬂow Automation

Agent-based systems provide a principled foundation for automating complex, multi-

step workﬂows by encapsulating autonomy, social ability, reactivity, and proactivity in

software entities that can perceive their environment, plan actions, and collaborate to

achieve organizational goals. In administrative healthcare contexts, agents can coordinate

extraction, validation, and submission tasks across heterogeneous applications while re-

specting local policies, role-based permissions, and exception handling. This aligns with

the evolution of robotic process automation from scripted UI macros toward intelligent

orchestration layers capable of decision-making and resilience to variability in interfaces

and data [

]. Classic agent research established the architectural and coordination

principles that enable such capabilities, including task decomposition, negotiation, and

cooperative problem solving, which remain directly relevant when simulating human

clerks that must navigate legacy web portals and desktop systems [

]. Recent literature

further argues for integrating agent reasoning with analytics and document understand-

ing so that automation not only executes keystrokes but also validates content, detects

anomalies, and triggers human review when conﬁdence is low [12].

Building on these foundations, contemporary “agentic AI” systems extend workﬂow

automation with tool use, planning, and self-monitoring, enabling agents to call OCR

and NLP services, enforce domain rules, and maintain auditable trails under governance

constraints. In healthcare, this makes it possible to operationalize human-in-the-loop

patterns where agents handle routine steps and escalate ambiguous cases to clinicians or

administrators, thereby reducing turnaround time without sacriﬁcing safety. Evidence

from biomedicine demonstrates that agentic approaches can structure complex, multi-

application tasks and coordinate specialized tools, suggesting strong applicability to clerical

data ﬂows in nephrology [

]. At scale, however, secure health-information exchange and

interoperability remain prerequisites; automated agents must comply with cybersecurity

controls and data-sharing policies so that throughput gains do not introduce privacy

or integrity risks [

]. Taken together, the literature supports a layered design in which

agent-based orchestration governs document AI pipelines, integrates with existing portals,

and embeds oversight and auditing an approach well-suited to automating repetitive,

rule-bound reporting workﬂows in dialysis centers.

In the present study, these agent-based principles are operationalized in a fully im-

plemented prototype integrated with the NephroM platform. The proposed virtual-clerk

model adopts a three-layer architecture document ingestion and recognition, data valida-

tion and adaptive reasoning, and task-execution automation all of which were developed

and deployed within the working system. The agent-based automation layer, built on Play-

wright and PyWinAuto, enables the virtual clerk to automatically submit veriﬁed records to

external government portals while maintaining compliance and auditability. Accordingly,

Technologies 2025,13, 530 5 of 24

the agent-based framework presented here represents not only a guiding architecture but

also an implemented orchestration system that coordinates OCR, validation, and automa-

tion modules in real-world clinical workﬂows. This implementation demonstrates the

feasibility of applying agent-based AI to healthcare administration, bridging conceptual

design with practical deployment in nephrology documentation.

2.4. Digital Health Transformation and Cybersecurity Constraints

Digital health transformation is frequently framed as an API-ﬁrst modernization of

clinical systems in which interoperability standards such as HL7 FHIR and application

frameworks like SMART on FHIR enable secure, modular exchange of health information.

In principle, this architecture allows external applications to retrieve and submit data in a

governable manner while preserving auditability, consent management, and least-privilege

access. In practice, however, many health authorities and public agencies remain reluctant

to expose data-ingest interfaces because operational and legal risks around cybersecurity,

privacy, and data misuse are perceived to outweigh the efﬁciency gains. The result is a

persistent gap between the promise of interoperable, standards-based exchange and the

reality of policy-constrained ecosystems that continue to rely on manual re-entry into legacy

web portals and desktop applications. Prior work highlights both the technical feasibility of

safe, standards-conformant exchange and the governance challenges that limit routine cross-

organizational sharing, underscoring the need for solutions that respect existing controls

while reducing clerical burden. Representative analyses of secure health-information

exchange and interoperable app ecosystems emphasize that successful adoption hinges on

end-to-end security controls, identity management, and rigorous auditing capabilities that

must be demonstrated to regulators before broader API access is permitted [2,22].

Concurrently, the healthcare threat landscape has intensiﬁed, with systematic reviews

documenting escalating risks from ransomware, phishing, credential compromise, and

exploitation of third-party components. These incidents have real operational consequences,

including care delays and data integrity concerns, which further discourage authorities

from opening inbound programmatic channels without robust mitigations [

]. To balance

transformation with risk, emerging approaches combine privacy-preserving computation

and veriﬁable infrastructure such as blockchain-backed audit trails for provenance and

federated learning to keep raw patient data local while enabling shared model improvement.

Although these technologies do not eliminate risk, they provide concrete mechanisms to

strengthen auditability, reduce data movement, and demonstrate compliance, thereby

making controlled automation more acceptable to oversight bodies [

]. Within such

policy and security constraints, agent-based automation that operates through sanctioned

user interfaces paired with document-AI pipelines and human-in-the-loop review offers a

pragmatic path to efﬁciency. It can preserve existing governance boundaries while reducing

redundant data entry and improving timeliness in high-frequency domains like nephrology

information management.

Despite the rapid progress of AI in healthcare data management, several research

gaps remain unresolved. Existing studies on electronic health record automation and

OCR pipelines largely focus on general clinical documentation or radiology reports, with

limited exploration of high-frequency, high-volume specialties such as nephrology, where

repeated dialysis sessions generate a signiﬁcant clerical burden. While robotic process

automation and agent-based approaches have been applied in ﬁnance and business process

management, their integration with healthcare-speciﬁc image processing pipelines is still

underdeveloped. Moreover, most implementations emphasize technical accuracy without

sufﬁciently addressing the policy and cybersecurity constraints that prevent the use of open

APIs in government health systems. This leaves a critical gap for research that demonstrates

Technologies 2025,13, 530 6 of 24

how intelligent, agent-based automation can operate effectively within restrictive security

environments, reduce redundant manual data entry, and improve timeliness in nephrology

information ﬂows while ensuring compliance with privacy and regulatory requirements.

3. Methodology

3.1. System Architecture

The intelligent virtual clerk was developed as an add-on module to the NephroM

system, an enterprise resource platform (ERP) used for managing dialysis operations and

patient data. Rather than replacing existing infrastructure, the module extends NephroM’s

capabilities by introducing an agent-based automation layer that interfaces with external

government portals such as the National Health Security Ofﬁce (NHSO) and the Health

Service Information Ofﬁce. This integration enables automated data exchange while main-

taining compliance with security and interoperability requirements. The intelligent virtual

clerk is organized into a three-layer system architecture that was fully implemented in

the working prototype, as illustrated in Figure 1. The ﬁrst layer, Document Ingestion and

Recognition, processes inputs from scanned forms, dialysis logs, and laboratory reports

through preprocessing methods such as noise reduction, normalization, and segmentation,

followed by optical character recognition using layout-aware models to generate structured

data. The second layer, Validation and Domain Rules, ensures that extracted values are

clinically plausible and consistent with administrative standards by applying predeﬁned

rules and allowing human-in-the-loop veriﬁcation for ambiguous cases. The third layer,

Agent-Based Automation, was also developed and deployed within the prototype. It

functions as the operational core of the virtual clerk, where intelligent agents automatically

interact with government platforms, handle exceptions, and ensure compliance through

automated logging and monitoring. Functional testing conﬁrmed that the automation

agents can execute submission and veriﬁcation tasks reliably across both web-based and

desktop systems.

Figure 1. Three-layer architecture of the intelligent virtual clerk showing implemented OCR, valida-

tion, and automation modules.

In order to provide project-speciﬁc details, the technical implementation of this archi-

tecture is further illustrated in Figure 2, which maps the tools and APIs applied to each

system layer. In the input stage, scanned forms are processed using OpenCV and Tesseract

OCR, while structured digital entries are collected directly from user inputs. In Layer 2, Py-

dantic is employed to enforce administrative and clinical validation rules, while ChatGPT

(GPT-4 API, 1 March 2025) supports anomaly detection and explanatory reasoning, com-

plemented by human veriﬁcation where necessary. In Layer 3, the automation framework

integrates Playwright for web-based workﬂows and PyWinAuto for PC-based government

Technologies 2025,13, 530 7 of 24

systems. Both were conﬁgured and tested to perform automated data entry, submission,

and report generation, ensuring adaptability and compliance across heterogeneous in-

tegration environments. This implementation demonstrates the practical application of

the virtual clerk in a real-world project, emphasizing its modularity and showing how

open-source libraries, LLM reasoning, and agent-based automation can be integrated into

a workﬂow that directly addresses the lack of interoperable APIs in government health

information systems.

Figure 2. Technical Implementation Architecture of the Virtual Clerk.

3.2. Agent-Based AI Design

The proposed virtual clerk is designed and implemented as an agent-based AI system

that follows a continuous cycle of Perception, Decision, Action, and Monitoring, as illus-

trated in Figure 3. All four stages were developed within the working prototype, enabling

end-to-end automation of data recognition, validation, and submission. In the perception

stage, the agent acquires data from two main sources: scanned clinical forms processed by

OpenCV and Tesseract OCR, and structured data directly entered by users. These inputs

are transformed into structured representations such as JSON, often accompanied by conﬁ-

dence scores that indicate recognition accuracy. The decision stage combines deterministic

validation with adaptive intelligence. Rule-based checks implemented through Pydantic

enforce administrative and clinical constraints, while LLM reasoning (ChatGPT) supports

anomaly detection, normalization of ambiguous inputs, and generation of explanatory

feedback. Speciﬁcally, the language-model component was implemented using the OpenAI

GPT-4 API (version released on 1 March 2025) under the gpt-4-turbo conﬁguration, se-

lected for its contextual reasoning capability and robust handling of clinical text validation

tasks. Each request had an average latency of approximately 1.6 s per query, which was

considered acceptable for near-real-time operations in clinical data entry.

The agent’s inferential capability resides within this decision stage, which constitutes

the reasoning layer of the architecture. In this layer, symbolic (rule-based) inference and

statistical (LLM-based) inference operate in concert. The rule engine applies 84 determinis-

tic constraints covering ﬁeld types, numeric ranges, logical dependencies, and temporal

consistency. When these rules are violated or OCR conﬁdence falls below 0.85, an empiri-

cally selected operating point identiﬁed during pilot evaluations where conﬁdence values

below 0.85 frequently correlated with character-level ambiguities and schema violations, a

constrained GPT-4 reasoning routine is invoked to propose context-consistent corrections

while strictly prohibiting data fabrication. Each candidate output is then re-validated

against the deterministic schema before being accepted. This rule

→

bounded reasoning

→

human escalation policy operationalizes bounded rationality, allowing the agent to

optimize accuracy and compliance under uncertainty while maintaining reactive efﬁciency

for deterministic cases.

Technologies 2025,13, 530 8 of 24

Figure 3. Agent cycle design of the virtual clerk.

In terms of technical rule design, the 84 deterministic rules are organized into ﬁve

categories: (1) type and format rules (e.g., enforcing ﬁxed-length alphanumeric HN identi-

ﬁers); (2) numeric range rules (e.g., validating that Creatinine lies within 0.3–20.0 mg/dL);

(3) unit and normalization rules (e.g., converting Urea from mmol/L to mg/dL when nec-

essary); (4) cross-ﬁeld dependency rules (e.g., ensuring PreWeight > DryWeight and Post-

Weight < PreWeight); and (5) temporal consistency rules (e.g., requiring SpecimenDate

≤

Re-

portDate). Representative examples include: R12—HGB must be within

6.0–20.0 g/dL

;

R23—ﬂag BUN values inconsistent with Creatinine-based physiological ratios; R41—reject

records where laboratory timestamps exceed session timestamps; and R73—escalate potas-

sium levels above 7.0 mmol/L for human review. These rule categories constitute the

system’s explicit knowledge base, providing the deterministic inference layer that inter-

faces with the constrained LLM reasoning routine.

To make explicit how these rules formalize domain knowledge, the virtual clerk

follows the three canonical components of an expert system. First, the 84 deterministic rules

form the knowledge base, encoding clinical, administrative, physiological, and temporal

constraints required in dialysis documentation. Second, the inference engine consists of

(a) a deterministic rule evaluator that performs deductive checks, (b) a bounded LLM

reasoning module invoked only under uncertainty, and (c) a deterministic re-validation

layer that enforces schema compliance before accepting any output. This hybrid mechanism

ensures that deductive logic remains primary while uncertainty is tightly controlled. Third,

the user interface layer is implemented within the NephroM platform, providing OCR

upload interfaces, real-time validation feedback, and human-in-the-loop review pathways.

The action stage enables the agent to simulate the role of a clerk by automatically

ﬁlling government forms and submitting records through existing platforms. Web-based

portals are handled with Playwright, while legacy PC applications are managed with Py-

WinAuto, ensuring ﬂexibility across heterogeneous infrastructures. Finally, the monitoring

stage incorporates human-in-the-loop validation, systematic audit logging, and reporting,

which provide transparency, error recovery, and compliance with security requirements.

Collectively, these four stages form a closed-loop cycle that allows the agent to perceive its

environment, reason over data, act autonomously, and adapt based on feedback.

When compared with conventional robotic process automation (RPA), the agent-based

AI design offers signiﬁcant advantages. RPA workﬂows are typically brittle, failing when

user interfaces change or when unexpected data is encountered. In contrast, intelligent

agents embody autonomy, adaptability, and reasoning [

]. By combining rule-based

validation with LLM reasoning, the virtual clerk does more than execute static scripts:

Technologies 2025,13, 530 9 of 24

it proactively identiﬁes anomalies, explains discrepancies, and collaborates with human

reviewers when required. The monitoring layer reinforces accountability through audit

trails and continuous feedback, moving beyond linear automation pipelines. This im-

plementation demonstrates the four canonical properties of intelligent agents autonomy,

reactivity, proactivity, and social ability within an operational prototype that reduces clerical

burden, improves data accuracy, and enhances trustworthiness compared with traditional

automation approaches.

3.3. Image Processing Pipeline

This section describes the image-processing pipeline that converts printed paper forms

and scanned dialysis logs into machine-readable data ready for validation and automation.

An overview of the pipeline is shown in Figure 4. Preprocessing begins with denoising, de-

skewing, and contrast normalization, followed by binarization to improve text–background

separation. Global thresholding [

] and adaptive binarization methods [

] are applied

depending on illumination and paper artifacts, with morphological operations used to

repair broken strokes and suppress speckle noise. Text regions are localized and recog-

nized using the Tesseract OCR engine, which employs adaptive classiﬁers and language

models to support robust character recognition in printed clinical forms [

]. For semi-

structured layouts such as tables, labels, and key–value zones, the pipeline aligns OCR

results with layout-aware models to preserve spatial relationships, thereby enabling reliable

ﬁeld mapping across variable templates [13,15].

Figure 4. Overview of the image processing pipeline for input and output.

Post-OCR processing further structures and quality-controls the extracted text. Rule-

based parsing standardizes identiﬁers, dates, and units, while conﬁdence scores and

heuristics trigger reprocessing or human review in ambiguous cases. Normalized ﬁelds

are serialized into JSON and passed to the validation module, where schema checks and

range constraints are applied. Such OCR-to-validation pipelines have been shown to

reduce turnaround times and improve registry data usability [

]. As manual data entry is

known to introduce errors, the pipeline is speciﬁcally designed to minimize keystrokes and

surface only exceptions, aligning with evidence that optimized data processing methods

can substantially lower error rates in clinical research [

]. The ﬁnal structured output with

conﬁdence scores, provenance, and audit artifacts feeds the agent’s decision stage and

ultimately the automation layer for secure submission to government systems.

3.4. OCR Conﬁgurations and Technical Integration

The intelligent virtual clerk employs a dual-conﬁguration OCR pipeline engineered

for high-ﬁdelity extraction and validation of semi-structured clinical records. The baseline

deterministic conﬁguration utilizes Tesseract v5.3.2 in legacy (non-LSTM) mode, executing

rule-driven segmentation and glyph-pattern correlation for character decoding. Input

frames are pre-conditioned through OpenCV-based Gaussian denoising, Hough-transform

de-skewing, and adaptive binarization, followed by morphological opening/closing to

Technologies 2025,13, 530 10 of 24

reconstruct stroke continuity and eliminate impulse noise. Post-processing modules imple-

ment deterministic normalization routines that apply regular-expression ﬁlters to enforce

ﬁeld syntax (patient identiﬁers, timestamps, measurement units) and to correct recur-

rent optical ambiguities (e.g., “O

→

0”, “I

→

1”). The structured output is mapped to the

NephroM schema, which deﬁnes 84 deterministic constraints covering data types, numeric

bounds, and inter-ﬁeld dependencies, providing a transparent baseline for audit-compliant

data ingestion.

The enhanced conﬁguration activates the Long Short-Term Memory (LSTM) recog-

nition engine of Tesseract v5.3.2, enabling contextual sequence modeling across character

windows. Pre-trained English–Thai models are extended with a domain-speciﬁc lexical

set incorporating nephrology terminology such as dialysate, hemodiaﬁltration, and Kt/V.

Tokens with conﬁdence scores below 0.85 invoke a generative reasoning sub-module im-

plemented via the OpenAI GPT-4 Turbo API. The model operates under a constrained,

instruction-based validation prompt that enforces bounded semantic behavior and prohibits

uncontrolled text generation, for example:

System Role: You are an AI-based clinical data validator operating within a rule-

constrained data entry system.

Your task is to analyze structured OCR outputs from nephrology forms, identify anomalies,

and propose corrections

only when they are derivable from contextual or domain-consistent evidence.

Instructions:

1. Input will be provided as a JSON object containing {ﬁeld_name, value, conﬁdence,

data_type, rule_reference}.

2. For each record:

- Verify that the value conforms to expected type, unit, and range constraints (as indicated

by rule_reference).

- If conﬁdence < 0.85 or rule violation is detected:

a. Analyze related ﬁelds for contextual inference (e.g., Pre_Weight vs. Post_Weight, Urea

vs. Creatinine).

b. If a correction is logically deducible, output the revised value and reasoning note.

c. If ambiguity remains, ﬂag for human review.

3. NEVER fabricate or infer data outside the observed record set.

4. Return all outputs in strict JSON format:

{

“ﬁeld_name”: “ “,

“original_value”: “ “,

“suggested_value”: “ “,

“conﬁdence”: “ “,

“status”: “validated|corrected|ﬂagged”,

“reason”: “ “

}

All inference transactions are encapsulated with execution metadata—token counts, la-

tency, and probabilistic conﬁdence metrics—to support deterministic replay. The post-LLM

output undergoes Pydantic schema validation, applying rule-based range checks, relational

Technologies 2025,13, 530 11 of 24

logic (e.g., dry-weight < post-dialysis weight), and temporal-ordering veriﬁcation before

serialization into JSON with provenance hashes for downstream decision-layer ingestion.

The overall implementation follows a hybrid microservice architecture integrating

a .NET (C#) front-end for visualization and supervisory control with Python-based back-

end services executing OpenCV, Tesseract, and GPT-4 operations through RESTful APIs.

Process automation leverages Playwright for web-form orchestration and PyWinAuto for

legacy desktop interfacing. This composite architecture constitutes a hybrid deterministic–

AI pipeline, where rule-based modules guarantee compliance and reproducibility, and

learning-based components contribute adaptive reasoning and contextual correction. The

resulting system achieves an explainable, auditable, and regulation-conformant automation

framework for nephrology information management. The GPT-4 Turbo model was not

retrained or ﬁne-tuned; rather, it was conﬁgured through a constrained instruction schema

and an internal validation wrapper that enforces structured input/output formats, response

length limits, and deterministic key–value alignment.

A key technical characteristic of the proposed architecture lies in its pipeline-level

determinism rather than reliance on the raw OCR engine alone. While the Tesseract legacy

model is algorithmically deterministic, its output can vary when input images differ in

illumination, rotation, or contrast. To address this, the system applies a ﬁxed and re-

peatable normalization sequence grayscale conversion, resolution standardization, global

thresholding, binarization, geometric de-skewing, and noise suppression before every OCR

operation. These steps ensure that input frames are rendered into a stable canonical form,

enabling reproducible OCR behavior across heterogeneous capture conditions. Further-

more, downstream components including the 84-rule deterministic validator, bounded

LLM inference, and deterministic post-validation act as stabilizing layers that correct resid-

ual variabilities and constrain uncertainty. This pipeline-level determinism, combined

with the rule

→

reasoning

→

re-validation loop, represents the most technically distinctive

aspect of the architecture and differentiates it from conventional OCR workﬂows that lack

inferential stabilization or safety-layered correction.

3.5. Evaluation Metrics

3.5.1. Error Detection Rate of Automation Accuracy

We evaluated the system’s capability to detect and correct erroneous data ﬁelds

by comparing two conﬁgurations: the basic OCR system, which relies solely on deter-

ministic pattern matching of OCR outputs, and the AI-enhanced OCR system, which

integrates advanced AI models for anomaly detection and normalization alongside ba-

sic rule-based checks, and also suggests the correct value whenever possible. The pri-

mary endpoint was the Error Detection Rate (EDR), equivalent to recall on the error class,

calculated as

Recall =TP

TP+FN

, where

represents the number of correctly detected

erroneous ﬁelds and

represents the number of undetected errors. To account for over-

ﬂagging,

Precision =TP

TP+FP

and

−score =

∗Precision∗Recall

Precision+Recall

, the harmonic mean of

precision and recall [

], were also reported. The evaluation was performed across three

common error categories: Missing Values, Out-of-Range, and Typo/Free-text, which reﬂect

typical clinical data entry problems [

]. Performance metrics were visualized through

grouped bar charts comparing precision, recall, and F1-score for both systems, as well as

receiver operating characteristic (ROC) curves [

] to illustrate overall detection perfor-

mance, with the area under the curve (AUC) calculated for each system. The evaluation

aimed not only to compare the raw recognition performance between the basic OCR and

the AI-enhanced OCR systems but also to validate how improved text accuracy supports

the agent-based automation layer in achieving reliable data submission. This connection

Technologies 2025,13, 530 12 of 24

between recognition accuracy and automation reliability forms a key validation step for

the virtual clerk architecture.

3.5.2. Time Efﬁciency

To evaluate operational performance, we measured the processing speed and time

efﬁciency of both systems using two key metrics. The ﬁrst was Average Time per Record,

deﬁned as the mean time required to complete the processing of a single record from initial

capture to ﬁnal conﬁrmation. The second was Total Completion Time, which measured the

total elapsed time required to process all records in a batch. Identical tasks were executed

under controlled conditions using both the basic OCR system and the AI-enhanced OCR

system, and the resulting mean values were recorded and compared. Time reduction was

expressed as both absolute time saved and percentage improvement. Each participant

processed dialysis reports under both conﬁgurations (basic OCR and AI-enhanced OCR)

in a counterbalanced order, and the system automatically recorded timestamps for start

and completion events to ensure objective measurement. LLM latency was computed

from server-side timestamps (request dispatch to response receive), and per-record cost

was derived from API token-usage logs averaged across the test set. This evaluation

was conducted by clinical staff in a real-world setting to ensure that the recorded times

accurately reﬂected the system’s practical performance.

3.5.3. System Usability Evaluation

User acceptance and usability were assessed using the System Usability Scale

(SUS) [

], a standardized instrument consisting of ten items with alternating positive

and negative statements rated on a ﬁve-point Likert scale. Each participant used both the

basic OCR system and the AI-enhanced OCR system and completed the SUS questionnaire

for each conﬁguration after task completion. For each conﬁguration, the mean SUS score

and standard deviation were calculated on a 0–100 scale, where higher scores indicate

better usability. These scores were interpreted against conventional benchmarks [

], with

scores below 50 categorized as Not Acceptable, scores between 50 and 70 as Marginal, and

scores above 70 as Acceptable, including the sub-ranges of Good (70–80) and Excellent

(>80). The results were visualized using an acceptability scale diagram to highlight the

usability difference between the two systems.

3.6. Experimental Design and Workﬂow

Figure 5presents the overall experimental design and workﬂow of the proposed

AI-enhanced OCR system. The process begins with data collection and preprocessing of

clinical laboratory documents obtained from dialysis centers. The baseline OCR system

is ﬁrst applied to extract textual content, followed by the AI-enhanced OCR process that

integrates rule-based validation and a LLM reasoning component to detect anomalies and

correct recognition errors. The processed data are then evaluated using precision, recall, and

F1-score metrics, while user feedback is collected to assess workﬂow efﬁciency and usability.

A human-in-the-loop validation step is included at the ﬁnal stage to ensure the integrity

of the corrected data before reporting. This experimental design provides a structured

framework for comparing both the baseline OCR and AI-enhanced OCR conﬁgurations

and for validating their effectiveness in real-world clinical documentation tasks.

Technologies 2025,13, 530 13 of 24

Figure 5. Experimental design and evaluation workﬂow of the AI-enhanced OCR system.

3.7. Data Sources

The data for testing the virtual clerk were obtained from real-world clinical environ-

ments, speciﬁcally a private dialysis center, where routine patient care and administrative

documentation are performed. In this study, most inputs consisted of printed laboratory

reports containing key biochemical parameters transferred from other clinical laboratories,

with the main challenge at the dialysis clinic being the need to manually enter these data

into the system, as shown in Figure 6. The evaluation of the virtual clerk system was

conducted using a dataset consisting of 65 documents of a single type, speciﬁcally printed

laboratory reports containing computer-generated printed text only, with no handwritten

entries or signatures. Each document included approximately 35 data ﬁelds representing

key clinical and administrative information. The data entry tasks were performed by ten

specialized dialysis nurses (ﬁve using the basic OCR system and ﬁve using the AI-enhanced

OCR system), all of whom routinely handle patient care documentation in a real-world

clinical setting.

To realistically simulate operational conditions, scanned images were supplemented

by webcam captures commonly used in clinical settings. This approach introduced natu-

ral variability in resolution, lighting, and perspective, reﬂecting how forms are actually

digitized in practice. As a result, the input data exhibited heterogeneous quality: some

documents were clear and properly aligned, whereas others showed skew, shadowing, or

angled views due to handheld captures. Such diversity in data quality was essential for

testing the robustness of the image-processing pipeline, which must reliably normalize

noisy or distorted inputs before feeding them into the validation and automation layers.

Technologies 2025,13, 530 14 of 24

Figure 6. Example of a printed laboratory report used as a data source.

4. Results

4.1. Results of Error Detection Rate of Automation Accuracy

The evaluation was performed on 65 documents, each containing approximately

35 data ﬁelds, totaling 2275 ﬁelds for error detection analysis. The comparison between the

basic OCR system and the AI-enhanced OCR system demonstrated clear improvements in

error detection performance across all three error categories, as presented in Table 1and

Figure 7. For Missing Values, the AI-enhanced OCR achieved a precision of 0.990, recall of

0.950, and F1-score of 0.969, all higher than those of the basic OCR system (precision 0.968,

recall 0.900, F1-score 0.933). In the Out-of-Range category, the AI-enhanced OCR showed

the greatest improvement with near-perfect recall (0.999) and precision (0.995), yielding an

F1-score of 0.997, compared to the basic OCR system’s precision of 0.951, recall of 0.967,

and F1-score of 0.959. Similarly, for Typo/Free-text errors, the AI-enhanced OCR reached

a precision of 0.990, recall of 0.977, and F1-score of 0.983, outperforming the basic OCR’s

precision of 0.922, recall of 0.950, and F1-score of 0.936. As shown in the grouped bar chart,

the AI-enhanced OCR consistently achieved higher precision, recall, and F1-scores across

all error categories, with the most notable improvement observed in Out-of-Range errors.

These results indicate that the integration of AI with traditional OCR signiﬁcantly enhances

automation accuracy and reduces manual veriﬁcation needs, particularly when handling

complex or ambiguous data ﬁelds.

Table 1. Precision, recall, and F1-score of the basic OCR and AI-enhanced OCR systems across three

error categories.

Error Category

Detect Method

Precision Recall (EDR) F1-Score

Missing Values OCR only 0.968 0.900 0.933

OCR + AI 0.990 0.950 0.969

Out-of-Range OCR only 0.951 0.967 0.959

OCR + AI 0.995 0.999 0.997

Typo/Free-text OCR only 0.922 0.950 0.936

OCR + AI 0.990 0.977 0.983

Figure 8shows the ROC curves comparing the precision, recall, and F1-score of the

basic OCR system and the AI-enhanced OCR system across three error categories: Missing

Technologies 2025,13, 530 15 of 24

Values, Out-of-Range, and Typo/Free-text. The results show that the AI-enhanced OCR

consistently outperformed the basic OCR system in all three metrics, with the most notable

improvement observed in the Out-of-Range category, where its performance approached

near-perfect accuracy. This demonstrates the effectiveness of integrating AI capabilities

with traditional OCR in improving error detection and correction, leading to more reliable

and accurate data processing.

Figure 7. Grouped bar chart comparing precision, recall, and F1-score for the two systems.

Figure 8. ROC curve comparing error detection performance of basic OCR and AI-enhanced OCR systems.

4.2. Results of Efﬁciency of Time

The evaluation of time efﬁciency was conducted using a total of 65 documents, each

containing approximately 35 data ﬁelds, processed under both conﬁgurations: the basic

OCR system and the AI-enhanced OCR system. As shown in Table 2, the average time per

record for the basic OCR system was 85.2 s, whereas the AI-enhanced OCR system reduced

this to 42.1 s, resulting in a time saving of 43.1 s per record. In terms of total completion

time for all 65 documents, the basic OCR system required 92.3 min, while the AI-enhanced

OCR system completed the task in only 45.6 min, representing a reduction of 46.7 min.

These results demonstrate that the AI-enhanced OCR system nearly doubled the processing

speed, signiﬁcantly reducing manual effort and improving overall workﬂow efﬁciency in

the clinical data entry process.

Technologies 2025,13, 530 16 of 24

Table 2. Comparison of processing times between the basic OCR and AI-enhanced OCR systems.

Metric OCR Only OCR + AI

Average Time Difference

Average Time per Record (sec)

85.2 42.1 43.1

Total Completion Time (min) 92.3 45.6 46.7

4.3. Results of System Usability and Adoption

The usability of the two systems was evaluated using the SUS, which consists of ten

standardized questions rated on a ﬁve-point Likert scale. As shown in Table 3and Figure 9,

the AI-enhanced OCR system achieved a higher overall SUS score of 81.0, placing it in

the “Excellent” category, whereas the basic OCR system received a score of 75.0, which

falls within the “Good” range. Across individual questions, the AI-enhanced OCR system

consistently scored slightly higher than the basic OCR system, particularly in areas related

to ease of use (Q3, Q9) and user conﬁdence (Q1). However, both systems showed lower

scores for Q6 and Q7, indicating that users perceived some inconsistency in the system

and recognized the need for improvement in training and learning speed. These results

suggest that while both systems are generally acceptable for clinical use, the integration

of AI signiﬁcantly enhances user satisfaction and system adoption by reducing perceived

complexity and improving workﬂow integration.

Table 3. Mean SUS scores for individual questionnaire items comparing the basic OCR and AI-

enhanced OCR systems.

No. Type Questions N OCR Only

Mean (SD)

OCR + AI

Mean (SD)

1 Positive

I think I would like to use the OCR

system/AI-enhanced OCR system regularly for

completing healthcare data entry tasks.

5 4.68 (0.47) 4.74 (0.42)

2 Negative I found the OCR system/AI-enhanced OCR

system unnecessarily complex. 5 1.92 (0.58) 1.64 (0.51)

3 Positive I thought the OCR system/AI-enhanced OCR

system was easy to use. 5 4.28 (0.52) 4.52 (0.46)

4 Negative

I think I would need support from a technical

expert to effectively use the OCR

system/AI-enhanced OCR system.

5 2.08 (0.63) 1.78 (0.55)

5 Positive

I found the functions of the OCR

system/AI-enhanced OCR system to be well

integrated into the existing workﬂow.

5 4.40 (0.49) 4.46 (0.48)

6 Negative

I thought there was too much inconsistency in the

OCR system/AI-enhanced OCR system. 5 3.18 (0.81) 2.86 (0.77)

7 Positive

I imagine most healthcare staff would learn to use

the OCR system/AI-enhanced OCR system

very quickly.

5 3.58 (0.69) 3.92 (0.66)

8 Negative I found the OCR system/AI-enhanced OCR

system cumbersome to use. 5 1.84 (0.54) 1.66 (0.53)

9 Positive I felt conﬁdent using the OCR

system/AI-enhanced OCR system. 5 4.32 (0.51) 4.62 (0.47)

10 Negative I needed to learn many things before I could start

using the OCR system/AI-enhanced OCR system.

5 2.24 (0.60) 1.92 (0.56)

Technologies 2025,13, 530 17 of 24

Figure 9. SUS acceptability scale indicating overall usability levels of the two systems.

5. Discussion

5.1. Summary of Key Findings

The evaluation revealed that the AI-enhanced OCR system signiﬁcantly improved

its ability to detect and correct errors compared to the basic OCR system, as evidenced

by higher precision, recall, and F1-scores across all three error categories, as shown in

Figures 10 and 11. In the Out-of-Range category, the AI-enhanced OCR effectively ad-

dressed common decimal placement errors, such as when a laboratory value like “20.2” was

misread as “2.02.” This improvement not only enhanced data accuracy but also reduced

the risk of misinterpretation in clinical decision-making. For Typo/Free-text errors, the

AI-enhanced OCR accurately matched hospital numbers (HN) with patient names, even

when names were misspelled or inconsistently recorded, a task at which the basic OCR

system often failed. In the case of Missing Values, the system actively ﬂagged data ﬁelds

that were expected to contain information but were left empty, providing notiﬁcations to

users to manually verify and input the correct data. These capabilities illustrate how AI

integration enhances both detection accuracy and error resolution, enabling the system

to handle complex, real-world clinical data challenges that basic OCR approaches alone

cannot effectively manage.

Figure 10. Example of the Virtual Clerk detecting and highlighting Typo/Free-text errors and

Missing Values.

Technologies 2025,13, 530 18 of 24

Figure 11. Example of the Virtual Clerk detecting and highlighting Out-of-Range.

Beyond error detection, the AI-enhanced OCR system demonstrated substantial im-

provements in workﬂow efﬁciency and user satisfaction. The average time per record was

reduced by nearly half, and total completion time decreased signiﬁcantly, showing the

potential to accelerate routine clinical documentation tasks. Usability testing using the SUS

indicated that users rated the AI-enhanced system as “Excellent”, compared to the “Good”

rating for the basic OCR system [

]. Open-ended feedback highlighted that users valued

the system’s ability to reduce manual veriﬁcation and improve accuracy, though some areas,

such as consistency and ease of learning, still require enhancement. Together, these ﬁndings

demonstrate that integrating AI into OCR systems not only increases technical performance

but also supports more efﬁcient, user-friendly workﬂows, aligning with previous research

on AI-driven health informatics solutions [31,35].

5.2. Comparison with Previous Studies

The ﬁndings of this study are consistent with previous research that highlights the

limitations of traditional OCR systems and the beneﬁts of integrating AI for improving

accuracy in clinical documentation. Prior studies have shown that conventional rule-based

OCR systems are prone to common errors such as decimal misplacements and misinterpre-

tation of numeric values, which can lead to clinically signiﬁcant inaccuracies

[36,37]

. Our

results demonstrate that by incorporating AI-driven anomaly detection, the system was

able to correct errors automatically, particularly in the Out-of-Range category, reducing

risks to patient safety and improving data quality. Similar improvements were reported

by [

], who emphasized the role of machine learning models in identifying and correcting

inconsistent or erroneous health data in electronic health records (EHRs). These outcomes

also align with the broader literature on data quality management in healthcare, which

identiﬁes completeness, accuracy, and consistency as key dimensions for reliable EHR

data [31].

In terms of usability, our ﬁndings align with previous studies that have validated the

System Usability Scale (SUS) as a reliable measure of user acceptance in clinical systems.

The AI-enhanced OCR system substantially reduced the manual workload required for data

entry, allowing healthcare staff to spend more time focusing on direct patient care rather

than administrative tasks. Ref. [

] established interpretation thresholds for SUS scores,

categorizing them into levels such as Good and Excellent, which guided the interpretation

of our results. Although the AI-enhanced OCR system in our study achieved an Excellent

Technologies 2025,13, 530 19 of 24

usability rating, the handling of healthcare data requires a higher level of reliability due

to its direct impact on patient safety and clinical decision-making. Therefore, even highly

accurate and user-friendly systems must maintain a human-in-the-loop (HITL) process at

the ﬁnal stage to verify critical information before it is entered into electronic health records.

This approach has been recommended by several researchers as a safeguard to mitigate

residual risks and ensure accountability, particularly when AI systems are deployed in

high-stakes healthcare environments [

]. Similar to prior studies, our results indicate

that while AI automation can signiﬁcantly reduce manual workload, human oversight

remains essential for ﬁnal veriﬁcation to maintain data integrity and protect patient safety.

5.3. Practical Implications for Clinical Workﬂow

The implementation of the AI-enhanced OCR system has substantial implications for

optimizing clinical workﬂows, particularly in high-volume settings such as dialysis centers.

By automating routine data entry tasks and intelligently detecting common errors, this sys-

tem signiﬁcantly reduces the administrative workload placed on nurses and administrative

staff, enabling them to devote more time to direct patient care and clinical decision-making

rather than repetitive clerical work. This mirrors recent ﬁndings in generative AI research

showing that large language models (LLMs) embedded within EHRs can improve docu-

mentation quality and reduce editing burden, leading to more efﬁcient note-taking and

summarization workﬂows [

]. Moreover, by correcting decimal placement errors and

ensuring accurate linkage of hospital numbers with patient names even when names are

misspelled, the system enhances the completeness and accuracy of patient records. Such im-

provements directly address long-standing issues with electronic health record (EHR) data

quality, which is critical for safe and reliable clinical decision-making [

]. Early studies on

GenAI-driven clinical documentation also emphasize its potential to make records more

comprehensive and organized, though privacy, bias, and accuracy remain active concerns

that must be continuously managed [

]. From a computational perspective, the integra-

tion of the LLM (GPT-4 API, version released on 1 March 2025) introduced an average

response latency of approximately 1.6 s per query, primarily during the anomaly-detection

and normalization steps. This delay was considered acceptable for near-real-time clinical

workﬂows, as most data-entry tasks occur asynchronously with patient encounters. The

average GPU-equivalent cost of each API call was estimated at $0.002 per record, resulting

in minimal operational expense for batch processing. System-level optimization through

prompt-truncation, caching of frequent templates, and asynchronous request handling

further mitigated latency and ensured that end-to-end throughput remained compatible

with daily dialysis-unit workloads.

Beyond immediate workﬂow efﬁciency, the system provides a model for integrating

generative AI into healthcare in a manner that balances automation with safety. Recent

global guidance from the World Health Organization (WHO) highlights that large multi-

modal models must include governance mechanisms and a human-in-the-loop process to

ensure transparency, accountability, and patient safety, particularly in high-stakes environ-

ments [

]. Even though the AI-enhanced OCR system in this study demonstrated excellent

usability and substantial reductions in manual workload, ﬁnal veriﬁcation by human ex-

perts remains essential to mitigate residual risks and safeguard against potential errors

that may arise from automated processing [

]. This approach is consistent with

current evidence that AI systems should be viewed as augmenting rather than replacing

human expertise, supporting clinicians by streamlining documentation while maintaining

professional oversight. In the long term, widespread adoption of such systems could

transform healthcare operations by reducing administrative costs, improving regulatory

Technologies 2025,13, 530 20 of 24

compliance, and ultimately allowing clinicians to spend more time on patient-centered

care, while adhering to global standards for ethical AI deployment [45,46].

Beyond usability and governance considerations, the quantitative ﬁndings further

highlight the operational impact of the system. The improvements in precision, recall,

and processing time observed in Tables 1and 2directly support the functionality of the

agent-based automation layer (Layer 3). Higher OCR accuracy ensures that the virtual clerk

can perform automated submission with minimal human correction, reducing propagation

of data errors into national reporting platforms. The latency measurements conﬁrm that

the LLM-based reasoning process introduces negligible delay, maintaining near real-time

responsiveness required in clinical documentation workﬂows. Together, these outcomes

demonstrate that enhanced recognition reliability and operational efﬁciency are critical

enablers of safe and effective automation, validating the design of the virtual clerk as a

practical agent system rather than a standalone OCR tool.

Comparatively, previous studies on clinical documentation automation have explored

a range of approaches, including conventional rule-based systems, convolutional neural

networks (CNNs) for image recognition, and transformer-based models for text normal-

ization [

]. Rule-based frameworks, while transparent and interpretable, often lack

scalability across different form formats and require frequent manual updates when insti-

tutional templates change. Deep-learning methods achieve higher recognition accuracy

but typically demand large annotated datasets and extensive computational resources,

which may not be feasible in smaller healthcare facilities. The agent-based OCR framework

proposed in this study seeks to balance these trade-offs by combining deterministic rule

validation with adaptive LLM reasoning, enabling ﬂexible data interpretation without

retraining for each new layout. Nonetheless, this approach also inherits limitations related

to LLM latency, dependency on API availability, and the absence of domain-speciﬁc ﬁne-

tuning [

]. Understanding these comparative strengths and weaknesses helps clarify

the methodological landscape and provides a foundation for selecting suitable approaches

in future research and system deployment.

5.4. Limitations and Future Work

While the results of this study demonstrate the potential of the AI-enhanced OCR

system, several limitations must be acknowledged. The dataset was limited to 65 docu-

ments from a single dialysis center and consisted solely of computer-printed text, excluding

handwritten notes and mixed-content scanned forms. This narrow dataset restricts the

generalizability of the ﬁndings, as real-world clinical workﬂows often involve diverse

document formats and varying image quality. Moreover, the study primarily focused on

error detection and workﬂow efﬁciency without assessing the downstream clinical impact

of these improvements, such as whether enhanced data accuracy contributes to better

patient outcomes or operational decision-making. Although the AI-enhanced OCR system

substantially reduced manual workload, a HITL veriﬁcation step remained necessary to

ensure data integrity and patient safety. The usability evaluation also involved a relatively

small group of participants, limiting the representativeness of user experience ﬁndings

across different clinical roles and institutional contexts. In addition, the experiment com-

pared only two conﬁgurations, basic OCR and AI-enhanced OCR, without including other

control conditions such as commercial OCR systems, human-only data entry, or hybrid

approaches, which could provide a more comprehensive evaluation. Another limitation

concerns algorithmic optimization; the current system employs general-purpose pretrained

models for image processing and language reasoning without domain-speciﬁc ﬁne-tuning,

which may constrain its ability to capture the linguistic and contextual nuances of dialysis

data. Furthermore, the study did not conduct formal sensitivity analyses of key parameters

Technologies 2025,13, 530 21 of 24

such as OCR conﬁdence thresholds or preprocessing settings, and uncertainty handling

was evaluated only qualitatively. Common failure cases (e.g., ambiguous numeric charac-

ters or low-contrast regions) were observed during pilot testing but were not quantiﬁed

systematically, representing another limitation of the present evaluation.

Future work should address these limitations through several strategic directions. First,

expanding the dataset to include a larger and more diverse sample from dialysis centers

of different sizes, ownership types, and geographic regions will improve external validity

and ensure robust performance across varied operational contexts. Second, comparative

studies involving multiple control groups such as commercial OCR software or human-only

workﬂows will help benchmark the relative advantages of AI-assisted approaches. Third,

future versions of the system should focus on customizing and ﬁne-tuning models using

dialysis-speciﬁc terminology, parameter ranges, and common error patterns to improve

anomaly detection and contextual correction. Fourth, enhancements to the user interface

should emphasize quick-operation features such as one-click ﬁlling, batch conﬁrmation,

and smart auto-completion to further streamline workﬂow efﬁciency. Fifth, the design of

charts and illustrations can be optimized for greater intuitiveness and clarity, supporting

real-time decision-making by clinical staff. Finally, large-scale deployment studies should

examine governance mechanisms, explainability features, and privacy safeguards to ensure

the ethical and safe integration of AI-enhanced OCR systems into national healthcare

infrastructures. By addressing these directions, future iterations of the system could evolve

into highly reliable and adaptive tools that improve clinical efﬁciency, data quality, and

patient-centered care.

6. Conclusions

This study introduced and evaluated an AI-enhanced OCR system designed to im-

prove the accuracy and efﬁciency of clinical data entry in dialysis care settings. By integrat-

ing advanced anomaly detection and normalization capabilities, the system successfully

addressed common and critical errors, such as decimal placement issues and mismatches

between hospital numbers and patient names, while also ﬂagging missing values for man-

ual review. The evaluation demonstrated that the AI-enhanced OCR system signiﬁcantly

reduced error rates and processing times compared to a basic OCR system, while achieving

excellent usability ratings among clinical users. These improvements indicate that the sys-

tem can serve as a valuable tool for enhancing data quality and streamlining administrative

workﬂows, ultimately allowing healthcare professionals to dedicate more time to direct

patient care.

Although the system delivered substantial beneﬁts, human oversight remained a cru-

cial component to ensure patient safety and data integrity. The ﬁndings support the concept

of a human-in-the-loop approach, where automation assists with high-volume, repetitive

tasks while ﬁnal veriﬁcation remains under professional supervision. Looking ahead, the

expansion of this system to include diverse document types, integration with electronic

health record platforms, and the incorporation of more advanced AI technologies could

further transform clinical documentation practices. By continuing to reﬁne both technical

capabilities and governance frameworks, AI-driven OCR systems have the potential to

become trusted, scalable solutions that not only reduce administrative burden but also

contribute to safer, more efﬁcient, and patient-centered healthcare delivery.

Author Contributions: Conceptualization, S.C. and K.I.; methodology, P.J. and K.I.; software, P.W.;

validation, P.W. and K.I.; formal analysis, K.P. and K.I.; investigation, P.W.; resources, K.I.; data

curation, K.I.; writing—original draft preparation, P.W. and K.I.; writing—review and editing, K.P.;

visualization, P.W.; supervision, K.I.; project administration, K.P.; funding acquisition, K.I. All authors

have read and agreed to the published version of the manuscript.

Technologies 2025,13, 530 22 of 24

Funding: This research was partially supported by Chiang Mai University and National council of

Thailand (NRCT).

Institutional Review Board Statement: The study was conducted in accordance with the Declaration

of Helsinki and approved by the Institutional Review Board of Committee of Research Ethics, Faculty

of Public Health, Chiang Mai University (ET031/2024).

Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: The data presented in this study are available upon request from the

corresponding author due to restrictions. The data are not publicly available.

Conﬂicts of Interest: The authors declare no conﬂicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI Artiﬁcial Intelligence

APIs Application Programming Interfaces

EHR Electronic Health Records

OCR Optical Character Recognition

RPA Robotic Process Automation

SUS System Usability Scale

References

Satirapoj, B.; Tantiyavarong, P.; Thimachai, P.; Chuasuwan, A.; Lumpaopong, A.; Kanjanabuch, T.; Ophascharoensuk, V. Thailand

Renal Replacement Therapy Registry 2023: Epidemiological Insights into Dialysis Trends and Challenges. Ther. Apher. Dial. 2025,

29, 721–729. [CrossRef]

Spanakis, E.G.; Sfakianakis, S.; Bonomi, S.; Ciccotelli, C.; Magalini, S.; Sakkalis, V. Emerging and Established Trends to Support

Secure Health Information Exchange. Front. Digit. Health 2021,3, 636082. [CrossRef]

Garza, M.Y.; Williams, T.; Ounpraseuth, S.; Hu, Z.; Lee, J.; Snowden, J.; Walden, A.C.; Simon, A.E.; Devlin, L.A.; Young, L.W.; et al.

Error Rates of Data Processing Methods in Clinical Research: A Systematic Review and Meta-Analysis of Manuscripts Identiﬁed

through PubMed. Int. J. Med. Inform. 2025,195, 105749. [CrossRef]

Zhou, X.; Zeng, T.; Zhang, Y.; Liao, Y.; Smith, J.; Zhang, L.; Wang, C.; Li, Q.; Wu, D.; Chong, Y.; et al. Automated Data Collection

Tool for Real-World Cohort Studies of Chronic Hepatitis B: Leveraging OCR and NLP Technologies for Improved Efﬁciency. New

Microbes New Infect. 2024,62, 101469. [CrossRef]

Budd, J. Burnout Related to Electronic Health Record Use in Primary Care. J. Prim. Care Community Health 2023,14,

21501319231166921. [CrossRef]

Gao, S.; Fang, A.; Huang, Y.; Giunchiglia, V.; Noori, A.; Schwarz, J.R.; Ektefaie, Y.; Kondic, J.; Zitnik, M. Empowering Biomedical

Discovery with AI Agents. Cell 2024,187, 6125–6151. [CrossRef]

7. Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019,380, 1347–1358. [CrossRef] [PubMed]

8. Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018,319, 1317–1318. [CrossRef] [PubMed]

Shickel, B.; Tighe, P.J.; Bihorac, A.; Rashidi, P. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic

Health Record (EHR) Analysis. IEEE J. Biomed. Health Inform. 2018,22, 1589–1604. [CrossRef] [PubMed]

10.

Topol, E.J. High-Performance Medicine: The Convergence of Human and Artiﬁcial Intelligence. Nat. Med. 2019,25, 44–56.

[CrossRef]

11.

Sinsky, C.; Colligan, L.; Li, L.; Prgomet, M.; Reynolds, S.; Goeders, L.; Westbrook, J.; Tutty, M.; Blike, G. Allocation of Physician

Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Ann. Intern. Med. 2016,165, 753–760. [CrossRef]

12.

Patrício, L.; Varela, L.; Silveira, Z. Integration of Artiﬁcial Intelligence and Robotic Process Automation: Literature Review and

Proposal for a Sustainable Model. Appl. Sci. 2024,14, 9648. [CrossRef]

13.

Wang, X.-F.; He, Z.-H.; Wang, K.; Wang, Y.-F.; Zou, L.; Wu, Z.-Z. A Survey of Text Detection and Recognition Algorithms Based on

Deep Learning Technology. Neurocomputing 2023,556, 126702. [CrossRef]

14.

Liu, Z.; Song, R.; Li, K.; Li, Y. From Detection to Understanding: A Systematic Survey of Deep Learning for Scene Text Processing.

Appl. Sci. 2025,15, 9247. [CrossRef]

Technologies 2025,13, 530 23 of 24

15.

Xu, Y.; Li, M.; Cui, L.; Huang, S.; Wei, F.; Zhou, M. LayoutLM: Pre-Training of Text and Layout for Document Image Understanding.

In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July

2020; ACM: New York, NY, USA, 2020.

16.

Chen, X.; Jin, L.; Zhu, Y.; Luo, C.; Wang, T. Text Recognition in the Wild: A Survey. ACM Comput. Surv. 2022,54, 1–35. [CrossRef]

17.

Nitayavardhana, P.; Liu, K.; Fukaguchi, K.; Fujisawa, M.; Koike, I.; Tominaga, A.; Iwamoto, Y.; Goto, T.; Suen, J.Y.; Fraser, J.F.; et al.

Streamlining Data Recording through Optical Character Recognition: A Prospective Multi-Center Study in Intensive Care Units.

Crit. Care 2025,29, 117. [CrossRef]

18. van der Aalst, W.M.P.; Bichler, M.; Heinzl, A. Robotic Process Automation. Bus. Inf. Syst. Eng. 2018,60, 269–272. [CrossRef]

19.

Syed, R.; Suriadi, S.; Adams, M.; Bandara, W.; Leemans, S.J.J.; Ouyang, C.; ter Hofstede, A.H.M.; van de Weerd, I.; Wynn, M.T.;

Reijers, H.A. Robotic Process Automation: Contemporary Themes and Challenges. Comput. Ind. 2020,115, 103162. [CrossRef]

20.

Jennings, N.R.; Sycara, K.; Wooldridge, M. A Roadmap of Agent Research and Development. Auton. Agent. Multi. Agent. Syst.

1998,1, 7–38. [CrossRef]

21. Maes, P. Agents That Reduce Work and Information Overload. Commun. ACM 1994,37, 30–40. [CrossRef]

22.

Mandel, J.C.; Kreda, D.A.; Mandl, K.D.; Kohane, I.S.; Ramoni, R.B. SMART on FHIR: A Standards-Based, Interoperable Apps

Platform for Electronic Health Records. J. Am. Med. Inform. Assoc. 2016,23, 899–908. [CrossRef]

23.

Kruse, C.S.; Frederick, B.; Jacobson, T.; Monticone, D.K. Cybersecurity in Healthcare: A Systematic Review of Modern Threats

and Trends. Technol. Health Care 2017,25, 1–10. [CrossRef]

24.

Kuo, T.-T.; Kim, H.-E.; Ohno-Machado, L. Blockchain Distributed Ledger Technologies for Biomedical and Health Care Applica-

tions. J. Am. Med. Inform. Assoc. 2017,24, 1211–1220. [CrossRef]

25.

Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al.

Federated Learning in Medicine: Facilitating Multi-Institutional Collaborations without Sharing Patient Data. Sci. Rep. 2020,10,

12598. [CrossRef] [PubMed]

26. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979,9, 62–66. [CrossRef]

27. Sauvola, J.; Pietikäinen, M. Adaptive Document Image Binarization. Pattern Recognit. 2000,33, 225–236. [CrossRef]

28.

Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis

and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; IEEE: Piscataway, NJ, USA, 2007; Volume 2.

29.

Hsu, E.; Malagaris, I.; Kuo, Y.-F.; Sultana, R.; Roberts, K. Deep Learning-Based NLP Data Pipeline for EHR-Scanned Document

Information Extraction. JAMIA Open 2022,5, ooac045. [CrossRef]

30.

Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv

2020. [CrossRef]

31.

Weiskopf, N.G.; Weng, C. Methods and Dimensions of Electronic Health Record Data Quality Assessment: Enabling Reuse for

Clinical Research. J. Am. Med. Inform. Assoc. 2013,20, 144–151. [CrossRef] [PubMed]

32. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006,27, 861–874. [CrossRef]

33. Brooke, J. SUS—A Quick and Dirty Usability Scale. Ahrq.gov. Available online:

https://digital.ahrq.gov/sites/default/files/docs/survey/systemusabilityscale%2528sus%2529

comp%255B1%255D.pdf

(accessed on 26 September 2025).

34. Bangor, A.; Kortum, P.; Miller, J. Determining What Individual SUS Scores Mean: Adding an Adjective Rating Scale. J. Usability

stud. 2009,4, 114–123.

35. Lewis, J.R. The System Usability Scale: Past, Present, and Future. Int. J. Hum. Comput. Interact. 2018,34, 577–590. [CrossRef]

36.

Wu, Y.; Dalianis, H.; Velupillai, S. Errors in Clinical Text Processing and Their Impact on Decision-Making: A Review. Artif. Intell.

Med. 2020,104, 101833.

37.

Nguyen, P.A.; Shim, J.S.; Ho, T.B.; Li, W. Machine Learning-Based Approaches for Clinical Text Error Detection: A Systematic

Review. J. Biomed. Inform. 2022,127, 104018.

38.

Luo, Y.; Thompson, W.K.; Herr, T.M.; Zeng, Z.; Berendsen, M.A.; Jonnalagadda, S.R.; Carson, M.B.; Starren, J. Natural Language

Processing for EHR-Based Pharmacovigilance: A Structured Review. Drug Saf. 2017,40, 1075–1089. [CrossRef]

39.

Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.; Precise4Q consortium. Explainability for Artiﬁcial Intelligence in

Healthcare: A Multidisciplinary Perspective. BMC Med. Inform. Decis. Mak. 2020,20, 310. [CrossRef] [PubMed]

40.

Kelly, C.J.; Karthikesalingam, A.; Suleyman, M.; Corrado, G.; King, D. Key Challenges for Delivering Clinical Impact with

Artiﬁcial Intelligence. BMC Med. 2019,17, 195. [CrossRef]

41.

Small, W.R.; Wang, L.; Horng, S. EHR-Embedded Large Language Models for Hospital-Course Summarization. JAMA Netw.

Open 2025,8, e250112. [CrossRef]

42.

Kernberg, A.; Gold, J.A.; Mohan, V. Using ChatGPT-4 to Create Structured Medical Notes from Audio Recordings of Physician-

Patient Encounters: Comparative Study. J. Med. Internet Res. 2024,26, e54419. [CrossRef]

43.

World Health Organization. Ethics and Governance of Artiﬁcial Intelligence for Health: Guidance on Large Multi-Modal Models.

Who.int. Available online: https://www.who.int/publications/i/item/9789240084759 (accessed on 26 September 2025).

Technologies 2025,13, 530 24 of 24

44.

Howell, M.D. Generative Artiﬁcial Intelligence, Patient Safety and Healthcare Quality: A Review. BMJ Qual. Saf. 2024,33, 748–754.

[CrossRef] [PubMed]

45.

Reddy, S. Generative AI in Healthcare: An Implementation Science Informed Translational Path on Application, Integration and

Governance. Implement. Sci. 2024,19, 27. [CrossRef] [PubMed]

46. Bakken, S. AI in Health: Keeping the Human in the Loop. J. Am. Med. Inform. Assoc. 2023,30, 1225–1226. [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual

author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to

people or property resulting from any ideas, methods, instructions or products referred to in the content.

0 views·24 pages

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF Free Download

Towards Intelligent Virtual Clerks: AI-Driven Automation for Clinical Data Entry in Dialysis Care PDF free Download. Think more deeply and widely.

Uploaded by manufacturing_library on 3/20/2026

/24

100%