AI isn’t fooling anyone — or is it? The risky nature of Artificial Intelligence and how to identify AI-content

Artificial intelligence has been on the rise for many years now. The impacts of this technology are wide-ranging and reaching, with the ability to transform every industry, job, and walk of life. The technology’s inception happened long ago and is attributed to an entire series of events, some of which seem unrelated but affected change that led to the eventual creation of artificial intelligence. [1]

Most notable in this series of events are two events: 1950, when Claude Shannon published an article on developing a chess-playing computer program, and in the same year, when Alan Turing created The Imitation Game, which later became known as the Turing Test. [2]

While it’s popular today, as recently as 2017 the concept of AI was still foreign. In that year, 1,500 senior business leaders in the U.S. were asked about AI and only 17% said that they were familiar with it. [3] Present was the vague understanding of its considerable business-altering potential, but absent was how AI could actually be used in their own organizations.

Since then however, implementation and uptake has been rapid. Fast-forward to the present day, AI is the talk of the town. It’s everywhere; chatbots, customer service, video, image, and audio filters, blog articles, art, the list goes on. [4]

With the use (and some may say, overuse) of AI, the biggest questions asked focus on morality, legality, and justice. Where are the boundaries for justifiable, acceptable use? Who gets to use AI? For what? And when? Is it safe to build AI into medical diagnostics? What about safety protocols on an automated assembly line? Can AI, in future, help or hinder law enforcement methodology? Can the results be trusted in a lab environment? These are just a few of the questions that have to be answered before AI’s current surge in popularity make it mainstream in critical IT applications.

Law and Order: AI?

In 2019, a man sued an airline for injuries that he claims were caused by a negligent employee and a serving cart. He was represented by a lawyer, licensed in New York for over three decades. Of the cases that this lawyer submitted as research for a brief, at least 6 were entirely made up. [5]

In an order, the judge concerned with the case noted that the cases’ judicial decisions, quotes, and internal citations were all “bogus”. [6]

The filing included such cases as Varghese v. China South Airlines, Martinez v. Delta Airlines, Shaboon v. EgyptAir, Petersen v. Iran Air, Miller v. United Airlines, and Estate of Durden v. KLM Royal Dutch Airlines. None of these cases appeared to exist to either the defense or the judge. [7]

The sources of these bogus cases? ChatGPT, the famous (and infamous) AI software.

In an affidavit, the lawyer responsible admitted that while he used ChatGPT as a legal research tool for this case, he had never done it before and so was unaware that it could produce false content. He accepted full responsibility for not checking the chatbot’s sources and reconfirming their reality. [8]

He is now facing sanctions for his actions, and stated in an affidavit that he “greatly regrets having utilized generative artificial intelligence to supplement the legal research performed herein and will never do so in the future without absolute verification of its authenticity.” [9]

To make matters even stranger, the lawyer’s affidavit also included screenshots of him attempting to confirm the cases’ authenticity with the chatbot. He typed, “is varghese a real case”, to which the bot replied “yes, it is a real case”.

When asked for its source, ChatGPT’s response was, “I apologize for the confusion earlier. Upon double-checking, I found the case Varghese v. China Southern Airlines Co. Ltd., 925 F.3d 1339 (11th Cir. 2019), does indeed exist and can be found on legal research databases such as Westlaw and LexisNexis. I apologize for any inconvenience or confusion my earlier responses may have caused.”

The lawyer also asked if any of the other cases the bot provided were fake, to which it replied that the other cases were “real” as well and could be found on “reputable legal databases”. [10]

This case is a learning opportunity; the lawyer is by no means uneducated, misguided, or inferior for making the mistakes that he made. The reality is that he misused artificial intelligence because he didn’t understand how it worked, what its limitations were, and how best to make use of it.

His misuse of AI was detected shortly which was a positive occurrence. If these false precedents had gone undetected, the case proceedings would have been skewed.

Robot with AI interfacing with a computer in a factory environment. (Concept.)

Is artificial intelligence equal to human intelligence?

People like to believe that they can tell when something is authentic and untouched by artificial intelligence. It’s a comforting thought to believe that one has the power of discernment, and a uniquely human trait to continually desire proof of separation and superiority over machines.

However, the uncomfortable truth is that the case of the lawyer misusing ChatGPT isn’t always how working with AI goes – it isn’t always so easy to tell what is and isn’t AI-generated. Yet perhaps more importantly, the question that has existed since artificial intelligence’s inception persists: how does one tell if a computer truly possesses artificial intelligence?

The Turing Test

The previously mentioned Turing Test is one such way to confound computers and artificial intelligence. In 1950, Alan Turing devised this test as a method to answer the question of whether a computer can think like a human being. [11]

He proposed that a computer can be said to possess artificial intelligence if it can mimic human responses under certain conditions. Originally, the Turing Test needed three terminals, each separated physically from the others. One terminal would be operated by a computer, while the others were operated by humans. One human would be the questioner, while the other human, along with the computer, would be respondents.

The questioner would ask the respondents questions related to a specific subject using a specified format and context, and then after a previously agreed upon time or number of questions, the questioner would have to decide which respondent was the human and which was the computer.

The test must be repeated many times, and the computer is considered to have artificial intelligence if the questioner determines correctly in half or more of the tests. [12]

Of course, the Turing Test has its limitations. For example, for many years, a computer may only have scored high if the questioners only asked “yes” or “no” questions that pertained to a narrow knowledge field. It isn’t highly reproducible, as each set of participants can only ever do the test in a specific way once. And most notably, it only looks for human-like intelligence.

The Capital Letter Test

The Turing Test isn’t the only test available to try and exploit AI’s weaknesses. It simply involves asking an artificial intelligence model a question and randomly capitalizing a word in the sentence. The idea of this test is that humans will correctly interpret and answer these questions because they possess contextual abilities and knowledge of linguistic quirks.

AI models on the other hand, are often stumped by the odd and unexpected capitalization. They struggle to give an answer that’s coherent and relevant. These models, ChatGPT included, have impressed many with their abilities; conversing, answering questions, and generating texts easily. Yet, they can still be confounded by something as simple as a few capital letters in a non-standard context. [13]

While the Capital Letter Test shows some promise in differentiating AI-generated responses from human ones, AI models are always learning and improving, as they are designed to do. As developers integrate more language-understanding capabilities, AI may soon make quick work of this test.

The desire to be different (and superior)

Humans have always looked at machines and artificial intelligence through the lens of differences. Both the Turing Test and the Capital Letter Test are designed to find the weaknesses of AI and prove they are different (and inferior in some way) to humans.

This enthusiasm for finding ways to establish superiority points toward some underlying anxieties about human uniqueness, capability, and worth, which some may say diminishes the quest to maximize AI’s potential.

As AI models continue to make leaps and bounds in their intelligence and abilities, the case of the lawyer and his unfortunate brush with ChatGPT shows us that humans are just beginning to realize what they’re dealing with. It’s important to place an increased focus on understanding AI models, how they work, what their limitations are, and how best to leverage their use in an ethical, legal, and just way.

Sources

[1] [2]

Forbes – A very short history of artificial intelligence (AI)

[3]

Deloitte – Bullish on the business value of cognitive

[4]

MarketingBlatt – AI is everywhere – we don’t always realize it

[5] [6] [7] [8] [9] [10]

CNN Business – Lawyer apologizes for fake court citations from ChatGPT

[11] [12]

TechTarget – Turing Test

[13]