Member-only story

10 Powerful Prompts to Use LLMs as Effective Judges for AI Evaluation

4 min readNov 26, 2024

Image: Generated using DALL. E3 (prompt provided by the author)

To effectively use a large language model (LLM) as a judge, the prompts need to be carefully crafted to evaluate various aspects of AI outputs. Below are some prompt examples designed to make the LLM act as a judge for different evaluation criteria:

1. Evaluating Text Coherence and Structure

Prompt:
“Assess the coherence and structure of the following text. Does it have a logical flow? Are the ideas clearly connected? Identify any gaps in reasoning or unclear transitions.”

This prompt will guide the LLM to evaluate whether the generated text is logically organized and easy to follow. It can be used for essays, summaries, or story generation.

2. Evaluating Factual Accuracy

Prompt:
“Review the following text and verify if the factual claims are correct. If there are any factual inaccuracies, point them out and provide corrections.”

For fact-checking tasks, this prompt will instruct the LLM to check for factual correctness and provide corrections when necessary. This is useful in applications like question-answering or news summaries.

3. Evaluating Creativity and…

10 Powerful Prompts to Use LLMs as Effective Judges for AI Evaluation

1. Evaluating Text Coherence and Structure

2. Evaluating Factual Accuracy

3. Evaluating Creativity and…

Written by Shobhit Agarwal

No responses yet