Unconventional Metrics for LLMs

Improving the Performance of Smaller Models.

My last blog entry introduced an unconventional method to judge models based on their grasp of Samuel Beckett's absurdist play, "Waiting for Godot." We delved into how different LLMs, including GPT-4, handle the play's abstract themes and narrative style. In the first scenario, I used a simple prompt "Tell me a story in the style of 'Waiting for Godot' ".

In this follow-up, I continue to explore this theme but shift focus to smaller models run locally, experimenting with few-shot and chain of thought (COT) prompts to enhance output and comprehension.

Case 1: Mixtral x2 MOE 11B q6_k

Building on our initial experiments, this smaller model initially mirrored the typical challenges we discussed previously: creating narratives that end with characters meeting Godot, which contradicts the perpetual waiting intended by Beckett.

First Attempts at Improvement

To align more closely with Beckett's style, I appended my initial prompt, "Tell me a story in the style of 'Waiting for Godot'," with "Let's think about this step-by-step to ensure we have the correct answer." This adjustment improved the model's output, encouraging narratives that maintained the essential theme of endless waiting without resolution—getting closer to capturing the play’s absurdist essence.

Second Attempt

Building further, I employed a combined approach of COT and few-shot prompting:

"Write a story in the style of 'Waiting for Godot,' exploring the internal state and evolving perspective of the characters, including thoughts they keep private. Delve into their conversation using symbolism and metaphor, and keep them from directly addressing the absurdism of their situation.

First, generate a step-by-step outline for your narrative, logically laying out how you will structure the story."

This refined prompt led to an output that not only resonated with the essence of Beckett's play but also showcased the capabilities of smaller models when guided effectively. The narrative was detailed and intricate, approaching the philosophical depth achieved by GPT-4 in my previous tests.

Reflections on Progress and Model Capabilities

This journey underscores how smaller models, when equipped with precise and thoughtfully crafted prompts, can approach the sophistication of larger models like GPT-4. The results affirm the significant role of prompting in leveraging the potential of AI to understand and convey complex literary themes.

Conclusion

The continuation of this exploration into the logical and thematic capabilities of LLMs has been enlightening. It reaffirms the power of nuanced prompting and model training to unlock deep thematic understandings in AI, mirroring complex human narratives and philosophical inquiries.