Score: 1

Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails

Published: August 25, 2025 | arXiv ID: 2508.18384v1

By: Kellen Tan Cheng , Anna Lisa Gentile , Chad DeLuca and more

BigTech Affiliations: Princeton University IBM

Potential Business Impact:

Creates safer AI by spotting bad health advice.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

The pervasiveness of large language models (LLMs) in enterprise settings has also brought forth a significant amount of risks associated with their usage. Guardrails technologies aim to mitigate this risk by filtering LLMs' input/output text through various detectors. However, developing and maintaining robust detectors faces many challenges, one of which is the difficulty in acquiring production-quality labeled data on real LLM outputs prior to deployment. In this work, we propose backprompting, a simple yet intuitive solution to generate production-like labeled data for health advice guardrails development. Furthermore, we pair our backprompting method with a sparse human-in-the-loop clustering technique to label the generated data. Our aim is to construct a parallel corpus roughly representative of the original dataset yet resembling real LLM output. We then infuse existing datasets with our synthetic examples to produce robust training data for our detector. We test our technique in one of the most difficult and nuanced guardrails: the identification of health advice in LLM output, and demonstrate improvement versus other solutions. Our detector is able to outperform GPT-4o by up to 3.73%, despite having 400x less parameters.

SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering

Artificial Intelligence

Creates fake patient data for medical research.

11 Aug 2025 1

89%

PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability

CV and Pattern Recognition

Stops AI from saying harmful or biased things.

10 Sep 2025 0

89%

Bypassing Prompt Guards in Production with Controlled-Release Prompting

Machine Learning (CS)

Breaks AI safety rules, making chatbots share secrets.

2 Oct 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

13 pages

Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails

Creates safer AI by spotting bad health advice.

Technical Abstract

SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering

PromptGuard: An Orchestrated Prompting Framework for Principled Synthetic Text Generation for Vulnerable Populations using LLMs with Enhanced Safety, Fairness, and Controllability

Bypassing Prompt Guards in Production with Controlled-Release Prompting