AI can write without hallucinating

But what it takes to make it behave is another story.

I spent 18 months supporting a project inside a huge corporation. A team of 20+ engineers, data scientists, SME experts, analysts, and testers built a tool for generating derivative content out of sample input.

And I can say with full confidence that repeating this for 90% of business scenarios makes absolutely zero sense.

I will give you the conceptual components that made it possible to prove it.

Rigorously reviewed input

The subject matter core used for creating the content was pristine every time. The input base consisted only of peer-reviewed scientific articles, that then underwent additional internal reviews by a team of experts. A chain of rigorous processes deployed before ever generating a single word.

In addition to providing a really solid factual base, this also meant that all of the input followed a specific, organized, and predictable structure and very clear reasoning paths: facts and conclusions, with little room for interpretation.

? Does your internal source content: meeting notes, ticket descriptions, designs, decision matrixes, product specs go through a similarly rigourous review process?

? Is your internal content better reviewed than what goes out to the public?

? Does it provide clear, unambiguous and not contradictory information every time?

One-to-many outputs

The idea behind this content generation machine was to use the same core of input material to generate many outputs in many versions.

The same 10, 15 articles resulted in potentially infinite combinations of output organized by audience, purpose, and format. Even one article could be turned into hundreds of emails, blog posts, or websites.

This setup makes sense in scenarios where you want unlimited amounts of content. Highly customizable marketing campaigns, multi-channel content, repeatable messaging, all centered around the same core of facts.

? What is your target input to output document ratio?

Clear PLAN for ROI

This wasn’t a random “here’s AI, now go do something with it” project. This initiative had a very specific goal and a precisely defined purpose. We operated with very clear data: the revenue driven by the resulting content was heavily measured and monitored before anyone ever wrote a line of code.

Effectively, the exercise of 20+ people working together for a year and a half changed the scale of a costly, manual review process, moving the precious time and energy of technical and legal reviewers from looking at individual outputs to input and unlocked new options to experiment as a result.

? Do you have metrics in place at your starting point to measure against?

Day to day operations

These types of initiatives are not a one and done type of a project. After the initial investment, you’re still incurring costs:

  • A smaller crew to provision any new project iteration. The framework is there, but every new internal client needs their data structured and loaded onto the platform the right way.

  • Token and infrastructure costs.

  • Albeit reduced, the time and effort from project contributors and reviewers.

The final equation must take into account both the initial lump investment and the day to day operation costs compared to running things the old way.

? Do you know how how your efficiency gains affect ROI?

You can, but should you?

The question is no longer as much “Can you make AI write content and not hallucinate” as it is “What does it take and is it worth it?”

In this case, it made sense. High reliability times outputs times the revenue driven by these outputs equaled measurable wins.

In others, the math may not be as promising.

  1. If you’re operating with inputs of varied format, completeness and correctness, that alone eats away the biggest efficiency gain. If every second, every third output needs to not only be reviewed, but also corrected by a person, it will be much more difficult to see a return on your investment.

  2. If you’re producing limited output, the effort put into perfecting your input will not be spread across tens or hundreds of pieces of resulting content. For example, if you’re creating user-facing documentation, the goal is not to write as much content as possible - quite the opposite. If you’re working on a new feature, that usually involves several pieces of internal content:
    product descriptions, specs, designs - to produce one or two user-facing documents.
    If you end up cleaning up 7 pieces of input to produce 2 outputs, you may not be seeing gains from this exercise.

  3. If you’re just trying to save, and not earn, the ceiling of your ROI is set with no room to grow. Once you hit it, that’s it.

The good news is: you can make an LLM produce factually correct, nicely formatted content at a reliable rate. The not-so-great news is: you have to build an intricate scaffolding out of top grade steel to achieve this result. If your scaffolding supports a skyscraper build, it makes perfect sense to invest. If you’re standing it up to repaint two walls, a ladder may be a better choice.

See if you should - Let’s talk goals, tools, and projects.