By Stephen DeAngelis
For the most part, scientists and researchers have been waiting with bated breath for the creation of artificial intelligence (AI) systems that can help them make new discoveries, create new materials, and move in new directions. As technologist Anirudh Mantha writes, “The rise of superhuman AI … could revolutionize content creation, code generation, and scientific discovery.”[1] At conference held last fall in Bellevue, WA, futurist Ray Kurzweil doubled down on his prediction that the singularity — the point at which computer intelligence far surpasses human intelligence and becomes sentient — is not far off. He told the audience, “By 2029 AI will pass the Turing test, and by 2045 it will reach a ‘singularity’.”[2] According to Kurzweil, “What we’ve seen so far from AI ain’t nothin’.” He told conference participants that once AI reaches a “general human capability” it will have already “surpassed us in every way.” Rather than being concerned about this development, Kurzweil believes humans and AI are “going to move into the future together.” The type of artificial intelligence envisioned by Kurzweil falls under the umbrella heading of artificial general intelligence (AGI) and encapsulates what we currently call generative AI (i.e., systems capable of autonomously generating new insights, knowledge, or products). Even though the future looks bright for generative AI, a number of lawsuits are clouding its future. Before looking at the merits of those lawsuits, a quick primer in generative AI is in order.
The Value of Generative AI
Currently, the closest thing we have to the kind of AI systems envisioned by Kurzweil are large language models (LLMs). Vincent Caruana, Senior Digital Marketing Manager for Search Engine Optimization at Algolia, explains, “At their core, LLMs are made up of a huge number of trainable variables, or parameters. An LLM is first trained — fattened up on vast portions of training data (input text). The parameters imbibe the essence of language through exposure to enormous datasets that comprise text from the various sources. Each parameter ultimately adjusts and aligns itself through iterative learning processes. Reinforcement learning from human feedback is applied, and the model’s proficiency is gradually enhanced. The trained model utilizes complex algorithms to learn patterns, relationships, and semantic meanings within language to ensure expert text generation. Over time, it not only recognizes syntax and grammar but gains insight on nuanced relationships and semantic intricacies embedded in the language.”[3] Although they are called large language models, the same principle can be used to create images, voices, and videos. And, it should come as no surprise that these capabilities can be used for nefarious purposes. As a result, numerous governments are looking at regulating generative AI content.
Whether the prospect of the singularity excites you or frightens you, a truly useful AI system needs to train on all available data — especially in medical and scientific fields. For example, when developing new drugs or making a medical diagnosis, one would hope that all pertinent research on the subject would be included in an AI system’s training material. If critical information is withheld, the results could be questionable. Bill Gates sees a bright future for generative AI. He believes 2023 “gave us a glimpse of how AI will shape the future.”[4] He adds, “We are just at the beginning of this transition right now. This is an exciting and confusing time, and if you haven’t figured out how to make the best use of AI yet, you are not alone. … There’s no question these are challenging times, but I remain optimistic about the future.” Why is he so optimistic? He writes, “AI is about to supercharge the innovation pipeline.” However, lawsuits are clouding the picture.
What Impact Will Lawsuits Have?
Not everyone or every organization is excited to have their work scanned and incorporated into LLMs. Last September, a number of authors filed a lawsuit against OpenAI, creator a of the best known LLM. Correspondent Aimee Picchi reported, “OpenAI, the creator of ChatGPT, is facing a lawsuit from bestselling writers including George R.R. Martin, John Grisham and Elin Hilderbrand that claims the company fed their books into its ‘large language models’ allegedly violating their copyrights and engaging in ‘systematic theft on a mass scale.’ The suit was filed in the Southern District of New York … on behalf of the Authors Guild and 17 noted writers, including Scott Turow, Jodi Picoult, David Baldacci, Michael Connelly, and George Saunders. … The complaint is the latest legal challenge facing OpenAI over the data it collects and uses to create the algorithm that underpins ChatGPT, an artificial intelligence tool that can answer questions and write text in sophisticated language that mimics how a human would respond.”[5]
More recently, the New York Times filed a lawsuit against Microsoft and OpenAI accusing them of copyright infringement and abusing the newspaper’s intellectual property. Tech correspondent Ryan Browne reports, “The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the ‘billions of dollars in statutory and actual damages’ it believes it is owed for the ‘unlawful copying and use of The Times’s uniquely valuable works.’ The Times said in an emailed statement that it ‘recognizes the power and potential of GenAI for the public and for journalism,’ but added that journalistic material should be used for commercial gain with permission from the original source. ‘These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise,’ the Times said.”[6] Countering this argument, Aswin Prabhakar, a Policy Analyst at the Center for Data Innovation, argues, “Throughout history, emerging technologies have faced resistance. The initial resistance to the printing press, for instance, mirrors The New York Times’ apprehensions about AI. Yet, just as the printing press revolutionized information dissemination and led to societal progress, AI also promises similar transformative potential. … The New York Times’ lawsuit mischaracterizes the nuanced dynamics of AI development and the principles of fair use for news articles available online. While it is crucial for policymakers to address legitimate copyright infringement concerns, such as rampant pirated content on the Internet, training AI models on information freely available on the Internet is not one of those.”[7]
In both of the cases mentioned above, there are clearly legitimate monetary and creative concerns involved. But as journalist Eray Eliaçık reports, “The lawsuit has broader implications for both the media and AI industries. The court’s decisions in this high-profile case may set a precedent for future copyright claims and legal battles between media organizations and AI companies. The outcome could influence how AI firms source and use content for training their models, potentially reshaping industry practices.”[8] Personally, I’m much more concerned about how the court’s decision could affect scientific and medical advances which benefit from the GenAI capabilities. Lawsuits in those areas could also involve copyright and privacy issues. Regina Sam Penti, a law partner at Ropes & Gray who specializes in technology and intellectual property, notes there are “way too many cases to count” centered on privacy concerns.[9] Penti adds, “While it’s not clear how legal threats will affect the development of generative AI, they could force creators of AI systems to think more carefully about what data sets they train their models on. More likely, legal issues could slow down adoption of the technology as companies assess the risks.”
Concluding Thoughts
Most of the world is hoping that AI systems will introduce a new age of innovation and advancement. There is no doubt people and organizations trying to address global challenges need all the help they can get. Hopefully, governments, court systems, and technology companies will be able to sort out the training data dilemma currently being faced. This will be particularly critical in scientific and medical fields. I’m optimistic rules will eventually be worked out; however, the process is likely to painful and prolonged. Science fiction writers mostly skip over the painful details about how futuristic AI systems were able to overcome legal issues to become so powerful, useful, and, sometimes, threatening.
Footnotes
[1] Anirudh Mantha, “5 most exciting developments in artificial intelligence for 2023–2024 and beyond,” Medium, 13 December 2023.
[2] Casey Luskin, “Ray Kurzweil Predicts: The ‘Singularity’ by 2045,” Evolution News, 8 November 2023.
[3] Vincent Caruana, “Top examples of some of the best large language models out there,” Algolia Blog, 1 November 2023.
[4] Bill Gates, “The road ahead reaches a turning point in 2024,” Gates Notes, 19 December 2023.
[5] Aimee Picchi, “George R.R. Martin, John Grisham and other major authors sue OpenAI, alleging ‘systematic theft’,” CBS Moneywatch, 20 September 2023.
[6] Ryan Browne, “New York Times sues Microsoft, ChatGPT maker OpenAI over copyright infringement,” CNBC, 27 December 2023.
[7] Aswin Prabhakar, “The New York Times’ Copyright Lawsuit Against OpenAI Threatens the Future of AI and Fair Use,” Center for Data Innovation, 12 January 2024.
[8] Eray Eliaçık, “NYT sues OpenAI and wants billions of dollars,” Dataconomy, 28 December 2023.
[9] Dylan Walsh, “The legal issues presented by generative AI,” MIT Sloan Management School, 28 August 2023.