Legal ethics

Generating more Generative AI content

It has somehow been a minute since I’ve written any updates on anything in the world of Generative AI issues. That hasn’t, of course, been because things haven’t been happening. They have. And even today I found myself as part of yet another panel presentation on the ethics issues surrounding the rise of the use of GAI products.

Of the things that have happened most recently though, I want to touch on two. One I will touch on quickly as really an update on a prior post. One merits a bit more, and a bit deeper, discussion.

Thing 1: You will recall that I’ve been pretty vocal about my frustrations with a lot of the court orders that have been implemented attempting to address lawyer usage of (G)AI in court filings. Most recently, I wrote about the fact that the Fifth Circuit was considering becoming the first federal appeals court to engraft such a requirement into its local rules. In surprisingly good news, the Fifth Circuit — in the face of overwhelmingly negative public comment — has decided not to adopt such a rule. Instead, the Fifth Circuit has simply said:

The court, having considered the proposed rule, the accompanying comments, and the use of artificial intelligence in the legal practice, has decided not to adopt a special rule regarding the use of artificial intelligence in drafting briefs at this time. Parties and counsel are reminded of their duties regarding their filings before the court under Federal Rule of Appellate Procedure 6(b)(1)(B). Parties and counsel are responsible for ensuring that their filings with the court, including briefs, shall be carefully checked for truthfulness and accuracy as the rules already require. “I used AI” will not be an excuse for an otherwise sanctionable offense.”

That is a good development for lawyers and for the growth of lawyer use of GAI products.

Thing 2, however, is not a great development for lawyers nor for the growth of lawyer use of GAI products, at least not use in the realm of legal research. You may also recall that I’ve also been outspoken about instances of lawyers trying to use ChatGPT to do legal research being an indictment of those lawyers’ decision-making rather than the GAI product because they were using a product that was not well-suited for legal research since it did not have access to case law and legal databases.

But large swaths of the legal profession have been hopeful that the partnering of companies that do have access to such things (companies such as LEXIS and Westlaw for example) with GAI products would lead to improvements in the ability to perform legal research. A study that has come out in the last week or so from Stanford provides every reason for lawyers to slam on the brakes about those expectations for now. Candidly, it also makes some points, as part of unpacking its findings, which raise significant questions about whether legal research will ever be something that lawyers are better off doing with any GAI in the mix.

The report, titled “Hallucination Free? Assessing the Reliability of Leading AI Legal Research Tools” can be read in full here.

For convenience though, here are a few of the highlights:

While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time.


Our article makes four key contributions. First, we conduct the first systematic assessment of leading AI tools for real-world legal research tasks. Second, we manually construct a preregistered dataset of over 200 legal queries for identifying and understanding vulnerabilities in legal AI tools. We run these queries on LexisNexis (Lexis+ AI), Thomson Reuters (Ask Practical Law AI), Westlaw (AI-Assisted Research), and GPT-4 and manually review their outputs for accuracy and fidelity to authority. Third, we offer a detailed typology to refine the understanding of hallucinations,” which enables us to rigorously assess the claims made by AI service providers. Last, we not only uncover limitations of current technologies, but also characterize the reasons that they fail. These results inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains an important open question for the responsible integration of AI into law.


Now, importantly for context, the paper helps educate readers on the notion that these kinds of programs are different than general chatbots using GAI because they involve “retrieval augmented generation” (RAG), in other words the stuff of not just answering prompts based on training data but having the ability to also go retrieve data from databases (such as all of the published case law in a LEXIS or Westlaw collection).

We expand the framework of legal hallucinations to two primary dimensions: correctness and groundedness. Correctness refers to the factual accuracy of the tool’s response (Section 4.1). Groundedness refers to the relationship between the model’s response and its cited sources (Section 4.2). Decomposing factual hallucinations in this way enables a more nuanced analysis and understanding of how exactly legal AI tools fail in practice. For example, a response could be correct but improperly
grounded. This might happen when retrieval results are poor or irrelevant, but the model happens to produce the correct answer, falsely asserting that an unrelated source supports its conclusion. This can mislead the user in potentially dangerous ways.


And, along the whole journey they make a few compelling points at various places about why the world of legal research may not really be an endeavor where RAG tools can ever effectively root out hallucinations from the mix. Those involve some of the important things about reading and understanding case law that, at least at the time of the study, the leading tools were having trouble with, such as:

  • Being able to distinguish and understand when the court is speaking rather than just saying what the parties have argued
  • Being able to distinguish between a court’s holding and other aspects of a court opinion
  • Being able to respect the order of authority in terms of courts and court structure in federal and state courts.

All in all, it is a fascinating and important read. It also raises real questions about just how much more efficient using the upscaled versions of traditional legal research products might ever be.

Invariably, I am sure that the companies in the crosshairs of the study’s findings will take issue with them or indicate that the issues have all been fixed with updates. Other companies will also likely claim their products are better and do not suffer from the same problems.

But given the inherently “black box” nature of many such endeavors it will remain very difficult for lawyers to know exactly whom to trust.