Another GAI ethics opinion & more

Among the topics I have been regularly addressing in presentations during 2024, including here at the end of year rush for CLE credits, has been the ethics issues associated with the rise of Generative AI. The core presentation I have done now almost 10 times has been a constant evolutionary process as things have rapidly changed. The appropriate ethics guidance has certainly stabilized after the issuance of ABA Formal Opinion 512. As loyal readers know, I am now in the middle of a three-year term on the Committee that issues such opinions. I am mentioning it here in passing because I was extremely surprised to read one of the most recent state level opinions and see that it makes not even one reference to the ABA opinion.

In September of this year, New Mexico issued an ethics opinion on how GAI impacts lawyers in New Mexico. You can read the full New Mexico opinion here.

Now, other than the guidance that opinion gives on confidentiality obligations, it appears to adequately address the landscape and even offers an important insight into the way a lawyer’s duty to preserve the confidential information of former clients could be implicated by the use of GAI which has not been highlighted by many other opinions.

But as to client confidentiality, perhaps in part by acting like the guidance from the ABA does not exist, the opinion gets things very wrong by failing to simply state that if a lawyer is going to input confidential client information into a GAI tool, then the lawyer really must first obtain informed client consent.

The opinion manages to do this in two ways. First, by attempting to discuss the idea that perhaps there are tools that could sufficiently protect confidentiality of information input into the tool so that a client would not have to be told (there are not). And then second by making all of the rest of the discussion of confidentiality in the opinion pointless by saying this:

Third, lawyers should always anonymize client information and
refrain from inputting details that could lead to the discovery of the client’s identity into Generative AI tools.

If the opinion was going to land on that solution, then that is pretty much all it should have said. Keeping that language in amongst the rest of the discussion is unhelpful at best.

Discussions about whether GAI will truly be as “transformative” for the practice of law as some claim, however, have to stay ongoing. In part because there are persistent indications that there are certain things that GAI simply is not currently very good at doing and may well never be sufficiently capable of doing. Another indication of the nature of these limitations can be found at the following link as well as the study it references here.

A summary of the review offered at the article explains:

The paper’s main finding: LLMs, the technology powering popular AI chatbots such as ChatGPT, aren’t capable of formal reasoning. To show this, the researchers started with a popular benchmark that tests state-of-the-art LLMs with grade-school mathematical word problems. Then they made two modifications to the problem set.

First, the researchers generated numerous variations of each problem, each with changed names and numerical values. For example, “Emily picked 4 apples” might become “John picked 7 apples,” “Lee picked 19 apples,” and so on. If LLMs were capable of formal reasoning, they should perform identically well on each problem variant. Trouble was, they didn’t—the models’ performance dropped slightly.

Second, the researchers modified the wording of each problem by adding minor details that seemed relevant, but that didn’t change the problem’s conclusion or the reasoning required to reach it. For example, a problem that involved counting apples might have the added detail “five of the apples were a bit smaller than average.” The result was a “catastrophic” collapse in performance, affecting even the best LLMs available today. Worse yet, the researchers found that even when given multiple examples of modified problems to train their answers on, LLMs still failed to improve their performance.

This demonstration of the problem GAI products have with logical reasoning adds to prior flags indicating that even when paired with trusted databases such products still struggle with recognizing when a court opinion is addressing a court’s ruling rather than what might be dicta or what the parties’ argued as well as with understanding the hierarchical structure of the court system when it comes to federalism and other issues.

If all of those issues continue to plague even Retrieval Augmented GAI products, then there will at some point have to be some kind of reckoning about exactly how useful it will be for lawyers to expand the ways that they use such products and for businesses to continue to try to develop specialized products targeting lawyers rather than simply having lawyers use the basic products (such as ChatGPT) that everyone else uses for more generic writing tasks.

And given the sheer volume of energy resources consumed by all of the infrastructure necessary for these products to be developed and exist, some existential questions really ought to be addressed.

Given the state of my part of the world and what the immediate future now portends with respect to the likelihood of rational and reasonable approaches to addressing important issues, however, it feels pretty unlikely that such a conversation will occur any time soon.