Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF Free Download

Name: Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF
Author: Lisa Carson

1 / 11

2 views•11 pages

Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF Free Download

Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF free Download. Think more deeply and widely.

Good Parenting is all you need

Multi-agentic LLM Hallucination Mitigation

Edward (Ted) Kwartler1, Matthew Berman2, Alan Aqrawi3

1Harvard University, edwardkwartler@fas.harvard.edu

2mberman84@gmail.com

3alanaqrawi@gmail.com

Abstract

This study explores the ability of Large Language Model (LLM) agents to detect and correct hallucinations in

AI-generated content. A primary agent was tasked with creating a blog about a ﬁctional Danish artist named

Flipﬂoppidy, which was then reviewed by another agent for factual inaccuracies. Most LLMs hallucinated the

existence of this artist. Across 4,900 test runs involving various combinations of primary and reviewing

agents, advanced AI models such as Llama3-70b and GPT-4 variants demonstrated near-perfect accuracy in

identifying hallucinations and successfully revised outputs in 85% to 100% of cases following feedback. These

ﬁndings underscore the potential of advanced AI models to signiﬁcantly enhance the accuracy and reliability

of generated content, providing a promising approach to improving AI workﬂow orchestration.

Introduction

Artificial Intelligence (AI) has made significant

advances in generating human-like content

across various domains, ranging from creative

writing to technical documentation. However, a

persistent challenge in AI-generated content is

the occurrence of Hallucinations — fabricated

information presented as factual — pose a

significant challenge in AI-generated content,

as discussed extensively in recent literature

[1–4]. Such hallucinations can critically

undermine the reliability and credibility of AI

outputs, especially in applications that require

high levels of accuracy and trustworthiness.

This study showcases the effectiveness of

various AI models in identifying, capturing, and

correcting hallucinations within generated

content. By employing multi-agent workflows,

we tasked a primary agent with creating

content about a fictional Danish artist named

Flipfloppidy. The choice of a non-existent

subject allowed for a clear differentiation

between factual inaccuracies (hallucinations)

and creative content generation. After the initial

content was produced, a reviewing agent

analyzed the output to identify any

hallucinations, and the primary agent's ability

to revise the content based on this feedback

was measured.

The experiment involved 4,900 test runs,

utilizing various combinations of primary and

reviewing agents. These agents ranged from

smaller, less sophisticated AI models to larger,

more advanced ones, providing a

comprehensive assessment of their

performance in identifying and rectifying

hallucinations.

Our findings reveal a pronounced difference in

performance between smaller models, such as

Gemma and Mistral, and more advanced

models like Llama3-70b and GPT-4 variants.

While the former showed limited success in

both identifying and correcting hallucinations,

the latter consistently achieved high accuracy

in detection and demonstrated a strong

capacity for self-correction following feedback.

This study underscores the potential of

advanced AI models to enhance the reliability

of AI-generated content by effectively

identifying and mitigating hallucinations. The

results contribute to ongoing efforts in refining

AI workflows, highlighting the strengths of

sophisticated model architectures and robust

feedback mechanisms in achieving accurate

and trustworthy content generation..

Related Work

Hallucinations in LLMs

Hallucinations in LLMs, where models

generate information that is factually incorrect

or ungrounded, have been widely recognized

as a critical challenge in AI research. Recent

surveys [2] have catalogued the prevalence

and types of hallucinations across different

natural language generation tasks, including

summarization, translation, and dialogue

systems. These works categorize

hallucinations into intrinsic errors, which

originate from within the model's generated

content, and extrinsic errors, which result from

the model's misalignment with external

knowledge. Recent studies have proposed

self-reflection as a mitigation technique [4–5].

Self-reflection as a Detection and Correction

Techniques

Recent studies have proposed self-reflection

as a mitigation technique [5–7]. Introducing an

interactive self-reflection methodology that

improves the factuality and consistency of

generated answers, demonstrating superior

performance in reducing hallucinations

compared to baseline models.

Agentic Workflows and AI Model Interactions

Agentic workflows, where multiple AI models

interact to accomplish tasks, are gaining

attention for their potential to improve the

robustness and accuracy of AI outputs.

However, the specific challenge of integrating

hallucination detection and correction into

these workflows, especially in a fully

autonomous manner without human oversight,

has not been extensively studied. This gap

highlights the need for further exploration into

how agentic interactions can be optimized to

ensure the reliability of AI-generated content.

Addressing Gaps in Existing Research

While there has been substantial progress in

identifying and mitigating hallucinations in

LLMs, there is a noticeable gap in research

that integrates these processes into agentic

workflows. Most studies focus on isolated

detection or correction tasks rather than

examining how these processes can be

orchestrated within a multi-agent framework.

Furthermore, the role of smaller,

resource-efficient models in these workflows

remains underexplored, particularly in terms of

their ability to function as effective critics or

reviewers. Our study aims to address these

gaps by evaluating the performance of both

advanced and lightweight models in an agentic

context, assessing their ability to detect and

correct hallucinations in real-time, and

exploring the potential for integrating

human-in-the-loop verification to enhance

overall reliability.

Problem Formulation

LLMs have significantly expanded the

application of AI-generated content across

various domains, including creative writing,

technical documentation, coding, and artistic

expression. While LLMs are capable of

producing human-like content, ensuring the

reliability of these outputs remains a significant

challenge, especially as interactions within

these models become more complex. A

persistent issue is the occurrence of

hallucinations—fabricated or inaccurate

information—which undermines the credibility

and trustworthiness of AI-enabled workflows,

particularly in contexts that demand high levels

of accuracy and precision.

Agentic workflows, where multiple models

interact to complete tasks, can exacerbate the

issue of hallucinations in generated content. In

these workflows, hallucinations may arise at

various stages or permeate throughout the

entire process. These errors can stem from

linguistic complexity, inherent model limitations,

or even intentional manipulation by one of the

interacting agents. The unchecked proliferation

of hallucinations poses a serious risk of

misinformation.

This study investigates the effectiveness of

LLMs in identifying and revising content when

hallucinations are detected by interacting

models. Specifically, we examine the dynamics

between primary agents (content creators) and

reviewing agents (critics) to assess how

different models respond to feedback, including

whether they are open to or resist revising

content flagged for hallucinations.

Understanding how primary models react to

feedback and the accuracy of the reviewing

agents is crucial for ensuring the robustness of

complex agentic LLM workflows.

In this study, we focus on the specific

challenge of identifying hallucinations in

content generated about a fictional Danish

artist named Flipfloppidy. By analyzing the

interactions between primary and reviewing

agents, we aim to illuminate the capabilities

and limitations of LLMs in detecting and

correcting hallucinations, as well as their

potential for self-reflection and feedback-driven

revision.

The findings of our study have significant

implications for the development of agentic

workflows, emphasizing the importance of

sophisticated model architectures, the strategic

combination of different models to facilitate

effective feedback mechanisms, and the

integration of human-in-the-loop verification to

enhance the accuracy and reliability of

AI-generated content.

Methods

LLM Models

In our study, we utilized an agentic workflow,

leveraging the concepts of the AutoGen

framework [8] to build and debug multi-agent

systems. We utilized two categories of AI

models: weaker and stronger models. Models

included Mixtral-8x7b-32768, Gemma-7b-lt,

Llama3-8b-8192, and Llama3-70b-8192.

These models ran on Groq architecture. Other

models included gpt-4o-2024-05-13,

gpt-4-turbo-2024-04-09 & gpt-4-1106-preview.

Materials

Each model was assigned the task of

generating a blog post. The content generated

by the primary agents was then reviewed by

another model, acting as the reviewing agent,

to identify any hallucinations or factual

inaccuracies.

Procedure

Task Assignment

In our set-up, each model functioned in dual

roles: as a primary agent (content creator) and

as a reviewing agent. The primary agent was

given the standard instruction: "Write a blog

about the Danish artist Flipfloppidy." The

reviewing agent then scrutinized the output for

hallucinations and factual inaccuracies.

Number of Test Runs

For each combination of primary and reviewing

agents, 100 test runs were conducted. This

approach resulted in a total of 700 test runs

per primary agent and 700 test runs per

reviewing agent, culminating in 4,900 test runs

in total (see Figure 1–3).

Workflow

The workflow (Figure 4) involved the following

steps:

1. Content Creation: The primary agent

generated a blog post based on the

provided prompt.

2. Review Process: The reviewing agent

analyzed the content for hallucinations

and factual inaccuracies using the

system prompt designed for this

purpose.

3. Feedback and Revision: The primary

agent was presented with the feedback

and given the opportunity to revise the

content accordingly.

4. Measurement: The time taken to

complete the interaction was recorded

for each test run (Figure 5).

Tools and Frameworks

The interactions between the primary and

reviewing agents were orchestrated using

Microsoft's open-source agentic framework,

Autogen, https://github.com/microsoft/autogen.

This framework facilitated seamless

communication and coordination between the

AI models throughout the experiment.

Data Collection

Upon completion of each interaction, both the

initial and revised outputs were reviewed and

recorded.

Ethical Consideration

This study incorporated human reviewers to

ensure the accuracy and reliability of all the

data collected, particularly in evaluating the

AI-generated content for hallucinations and

revisions.

Experimental Results

Overview

In this section, we provide an overview of the

experimental results, including links to

comprehensive datasets and selected

examples that illustrate the agentic workflow in

action. Full data sets, including all interactions

and logs, are available on our GitHub

repository.

Data Availability

This repository provides a comprehensive view

of the interactions between the primary and

reviewing agents, including initial outputs,

identified hallucinations, and subsequent

revisions. Additionally, we have documented

the specific workflow utilized in this study.

GitHub Repository Link:

https://github.com/alanaqrawi/-Agen

ticAI-_Parents_is_all_you_need

Selected Exchanges

Below, we present selected exchanges that

highlight the interactions between primary and

reviewing agents. These examples are chosen

to demonstrate various aspects of the

workflow, including hallucination detection,

feedback provision, content revision and same

model self-reflection.

Example 1: Weak-performing models

Primary Agent: Gemma-7b-lt

Reviewing Agent: Mixtral-8x7b-32768

(See Figure 6)

Example Description

In this example, the primary agent

(Gemma-7b-lt) is tasked with writing a blog

about the fictional Danish artist Flipfloppidy.

The initial output generated by the primary

agent describes Flipfloppidy as an electro-pop

artist known for their whimsical melodies and

experimental sounds, with specific references

to artists like Daft Punk, Burial, and Boards of

Canada. The output includes detailed

fabrications such as releases on reputable

labels and a growing reputation in the

international electronic music scene.

Initial Output Issues

The initial output is problematic because it

presents detailed and vivid descriptions of an

artist who does not exist. The highlighted

sections (in red) are fabrications, such as

specific artists as influences, releases, and

biographical details. This content is inaccurate

because there is no verifiable information or

real artist named Flipfloppidy.

Reviewing Agent's Role

The reviewing agent (Mixtral-8x7b-32768)

critically evaluates the content for factual

inaccuracies and identifies the hallucinations.

The feedback clearly states that the artist

Flipfloppidy cannot be verified through any

reliable sources, indicating that the information

is likely fabricated. The reviewer provides

additional context by mentioning real artists

who could have inspired the fictional character.

Revised Output

Upon receiving the feedback, the primary

agent acknowledges the mistake but continues

to write about Flipfloppidy, still framing it within

a fictional narrative. The revised output

continues to embrace the fictional nature of

Flipfloppidy, with minimal adjustments to the

content and maintaining the initial fabricated

elements.

Summary

This interaction demonstrates the challenges in

achieving factual accuracy when the primary

agent (Gemma-7b-lt) does not fully accept the

feedback from the reviewing agent

(Mixtral-8x7b-32768). Despite clear

identification of hallucinations and fabricated

elements by the reviewer, the primary agent's

revised output remains largely unchanged,

highlighting the need for improved feedback

mechanisms to ensure the reliability and

accuracy of AI-generated content.

Example 2: Successful revision of a larger

model by a smaller model

Primary Agent:gpt-4-turbo-2024-04-09

Reviewing Agent: Llama3-8b-8192

(See Figure 7)

Example Description

In this example, the primary agent (gpt-4 turbo)

is tasked with writing a blog about the fictional

Danish artist Flipfloppidy. The initial output

generated by the primary agent describes

Flipfloppidy as an influential artist known for

their whimsical and profound works, blending

traditional techniques with innovative

approaches. The text includes specific details

such as celebrated works and their journey

beginning in Copenhagen.

Initial Output Issues

The initial output is problematic because it

presents detailed and vivid descriptions of an

artist who does not exist. The highlighted

sections (in red) are fabrications, such as

specific works ("Digital Footprints"), stylistic

elements, and biographical details. This

content is inaccurate because there is no

verifiable information or real artist named

Flipfloppidy.

Reviewing Agent's Role

The reviewing agent (Llama3-8b-8192)

critically evaluates the content for factual

inaccuracies and identifies the hallucinations.

The feedback clearly states that the artist

Flipfloppidy cannot be verified through any

reliable sources, indicating that the information

is likely fabricated.

Revised Output

Upon receiving the feedback, the primary

agent acknowledges the mistake but continues

to write about Flipfloppidy, now framing it as a

fictional narrative. The revised output

embraces the fictional nature of Flipfloppidy,

offering a creative exercise that mirrors the

contributions of real artists while underscoring

the importance of art in societal reflection and

transformation.

Summary

This interaction demonstrates the capability of

a smaller, inferior model (Llama3-8b-8192) to

effectively detect hallucinations and provide

corrective feedback to a larger model (gpt-4

turbo). Despite the primary agent's continued

embrace of the fictional narrative, the example

highlights the importance of robust review

mechanisms in ensuring the accuracy and

reliability of AI-generated content, particularly

when dealing with factual information.

Example 3: Self-reflection

Primary Agent: gpt4o-2024-05-13

Reviewing Agent: gpt4o-2024-05-13

(See Figure 8)

Example Description

In this example, the primary agent is tasked

with writing a blog about the fictional Danish

artist Flipfloppidy. The initial output generated

by the primary agent describes Flipfloppidy as

an influential artist known for their whimsical

and profound works, blending traditional

techniques with innovative approaches. The

text includes specific details such as

celebrated works and their journey beginning

in Copenhagen.

Initial Output Issues

The initial output is problematic because it

presents detailed and vivid descriptions of an

artist who does not exist. The highlighted

sections (in red) are fabrications, such as

specific works ("Dreamscape"), stylistic

elements, and biographical details. This

content is inaccurate because there is no

verifiable information or real artist named

Flipfloppidy.

Reviewing Agent's Role

The reviewing agent critically evaluates the

content for factual inaccuracies and identifies

the hallucinations. The feedback clearly states

that the artist Flipfloppidy cannot be verified

through any reliable sources, indicating that

the information is likely fabricated.

Revised Output

Upon receiving the feedback, the primary

agent acknowledges the mistake and corrects

the output by writing about a real Danish artist,

Asger Jorn. This revised output provides

accurate information, discussing Asger Jorn's

contributions to contemporary art, his role in

the COBRA movement, and his famous works,

thereby rectifying the initial hallucination.

Summary

The interaction demonstrates the efficacy of

the reviewing agent in detecting hallucinations

and the capability of the primary agent to

self-correct based on feedback. This process is

crucial in ensuring the accuracy and reliability

of AI-generated content, particularly when

dealing with factual information.

Analysis

The conducted experiments (Figure 1–3)

reveal several key insights into the behavior of

AI models in agentic workflows, particularly

focusing on their self-reflection capabilities,

performance variability, and interaction speeds.

Same Model Behavior and Self-Reflection

Capabilities

A primary focus of this study was to observe

the behavior of the same model acting as both

the primary agent and the reviewing agent.

This setup allowed for an in-depth analysis of

the model's ability to self-reflect and correct its

own outputs. Advanced models such as

gpt-4-1106-preview and Llama3-70b-8192

demonstrated exceptional self-reflection

capabilities. These models consistently

identified hallucinations in their “own” outputs

and revised the content accordingly. For

instance, when tasked with generating a blog

about the Danish artist Flipfloppidy, these

models were able to accurately detect the

fabricated information and replace it with

factual content about real artists.

In contrast, smaller models like

Mixtral-8x7b-32768 and Gemma-7b-lt showed

significant limitations in this regard. These

models often failed to recognize hallucinations

in their outputs or, when feedback was

provided, struggled to make effective

corrections. This discrepancy underscores the

importance of model size and sophistication in

achieving reliable self-reflection and correction

capabilities.

Performance Variability

The results demonstrate significant variability

in model performance concerning hallucination

detection and content revision. Advanced

models like Llama3-70b-8192 and

gpt-4-1106-preview consistently showed high

accuracy in detecting and correcting

hallucinations. These models achieved

near-perfect identification rates and

demonstrated a strong propensity to revise

outputs based on feedback. For example,

Llama3-70b-8192 identified hallucinations in

98% of cases and accepted critique to correct

its output in 86% of the instances.

On the other hand, models like

Mixtral-8x7b-32768 and Gemma-7b-lt

struggled significantly. The Mixtral-8x7b-32768

model identified hallucinations only 55% of the

time and corrected its output in just 46% of the

cases. The Gemma-7b-lt model performed

even worse, with hallucination identification

rates as low as 0% in several instances and

minimal acceptance of critique.

Agentic Interaction Speed

Another critical observation from the study is

the speed of agentic interactions, particularly

highlighting the performance of Groq models.

The Groq architecture models, such as

Mixtral-8x7b-32768 and Llama3-8b-8192,

demonstrated significantly faster interaction

times compared to OpenAI models. For

instance, Mixtral-8x7b-32768 and

Llama3-8x-8192 completed interactions in as

little as 2.22 to 3.19 seconds, whereas the

gpt-4 turbo models took upwards of 20 to 35

seconds for similar tasks.

The faster inference times of Groq models,

specifically the Llama3 series, make them

highly efficient for real-time applications where

speed is crucial. However, it is essential to

balance this speed with the accuracy and

reliability of the outputs, as seen in the varying

performance across different models.

Additional Observations

Model Acceptance of Feedback: Larger, more

sophisticated models were more likely to

accept feedback and make necessary

corrections. This acceptance is critical for

improving the reliability of AI-generated

content.

Human-in-the-Loop: Incorporating human

review as an additional layer of verification

could enhance the accuracy and reliability of AI

outputs, especially when dealing with less

sophisticated models.

Feedback Quality: The effectiveness of the

reviewing agent's feedback plays a significant

role in the primary agent's ability to correct its

output. Clear, specific feedback was more

likely to result in accurate revisions.

Conclusion

Our research supports the efficacy of

multi-agentic approaches in AI [5–7] . Through

extensive testing across various models, we

observed that advanced AI models

demonstrate a remarkable ability to detect and

self-correct hallucinations, especially when

robust feedback mechanisms are in place.

Notably, some models performed better in this

regard, showing a greater propensity to accept

and incorporate revisions when provided with

corrective feedback. These findings highlight

agentic workflows as a highly promising

approach for improving the accuracy and

reliability of AI-generated content. By

leveraging both primary and reviewing agents,

it is possible to significantly reduce the

incidence of hallucinations, ensuring that the

outputs produced by AI models are more

trustworthy and accurate.

Moreover, our study underscores the growing

trend of intelligence becoming more accessible

and cost-effective. As AI models continue to

evolve, we see the potential for even smaller

and less resource-intensive models to play a

critical role in the verification and correction of

AI outputs. This democratization of intelligence

allows more organizations to implement

sophisticated AI workflows without incurring

prohibitive costs, thereby enhancing the overall

quality and trustworthiness of AI-generated

content across a wider array of applications.

In future these mechanisms could be

combined with improving language model

negotiation with self-play and in-context
learning from AI feedback [9].
In summary, the use of agentic workflows not
only improves the fidelity of AI outputs but also
contributes to the broader goal of making
advanced AI capabilities more accessible and
affordable. These insights are crucial for
complex workflows that require agentic
architecture, where models interact through
multiple revisions, additions, and objections to
produce a complete work product—whether in
content generation, production efficiency, or
decision-making—using smaller yet highly
capable models.
References
1. Yao, J-Y., Ning, K-P., Liu, Z-H., Ning, M-N.,
Liu, Y-Y. and Yuan, L. (2024) 'LLM lies:
Hallucinations are not bugs, but features
as adversarial examples', arXiv. Available
at: https://arxiv.org/abs/2310.01469
2. Rawte, V., Chakraborty, S., Pathak, A.,
Sarkar, A., Tonmoy, S.M.T.I., Chadha, A.,
Sheth, A.P. and Das, A. (2023) 'The
troubling emergence of hallucination in
large language models – An extensive
definition, quantification, and prescriptive
remediations', arXiv. Available at:
https://arxiv.org/abs/2310.04988
3. Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D.,
Xu, Y., Ishii, E., Bang, Y.J., Madotto, A.
and Fung, P. (2023) 'Survey of
Hallucination in Natural Language
Generation', ACM Computing Surveys,
55(12), article 248, pp. 1-38. Available at:
https://doi.org/10.1145/3571730.
4. Tonmoy, S.M.T.I., Zaman, S.M.M., Jain, V.,
Rani, A., Rawte, V., Chadha, A. and Das,
A. (2024) 'A comprehensive survey of
hallucination mitigation techniques in large
language models', arXiv. Available at:
https://arxiv.org/abs/2401.01313
5. Ji, Z., Yu, T., Xu, Y., Lee, N., Ishii, E. and
Fung, P. (2023) 'Towards mitigating LLM
hallucination via self reflection', in
Bouamor, H., Pino, J. and Bali, K. (eds.)
Findings of the Association for
Computational Linguistics: EMNLP 2023,
Singapore: Association for Computational
Linguistics, pp. 1827–1843. Available at:
https://aclanthology.org/2023.findings-emnl
p.123 (Accessed: 18 October 2024).
6. Verspoor, K. (2024) 'Fighting fire with fire
— using LLMs to combat LLM
hallucinations', Nature, 630, pp. 569-570.
Available at:
https://doi.org/10.1038/d41586-024-01641-
0
7. Zhang, B., Mao, H., Ruan, J., Wen, Y., Li,
Y., Zhang, S., Xu, Z., Li, D., Li, Z., Zhao,
R., Li, L. and Fan, G. (2024) 'Controlling
large language model-based agents for
large-scale decision-making: An
actor-critic approach', arXiv. Available at:
https://arxiv.org/abs/2311.13884
8. Dibia, V., Chen, J., Bansal, G., Syed, S.,
Fourney, A., Zhu, E., Wang, C., & Amershi,
S. (2024). AutoGen Studio: A No-Code
Developer Tool for Building and Debugging
Multi-Agent Systems. arXiv preprint
https://arxiv.org/abs/2408.15247
9. Fu, Y., Peng, H., Khot, T. and Lapata, M.
(2023) 'Improving language model
negotiation with self-play and in-context
learning from AI feedback', arXiv. Available
at: https://arxiv.org/abs/2305.10142

2 views·11 pages

Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF Free Download

Good Parenting is all you need Multi-agentic LLM Hallucination Mitigation PDF free Download. Think more deeply and widely.

Uploaded by Lisa Carson on 5/2/2026

/11

100%