They aren’t capable of that. This is why you sometimes see people comparing AI to compression, which is a bad faith argument. Depending on the training, AI can make something that is easily recognizable as derivative, but is not identical or even “lossy” identical. But this scenario takes place in a vacuum that doesn’t represent the real world. Unfortunately, we are enslaved by Capitalism, which means the output, which is being sold for-profit, is competing with the very content it was trained upon. This is clearly a violation of basic ethical principles as it actively harms those people whose content was used for training.
Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.
Like, if you want to be an artist and trace Mario in class as you learn, that’s fair use.
If once you are working as an artist someone says “draw me a sexy image of Mario in a calendar shoot” you’d be violating Nintendo’s IP rights and liable for infringement.
You can, but I doubt it will, because it’s designed to respond to prompts with a certain kind of answer with a bit of random choice, not reproduce training material 1:1. And it sounds like they specifically did not include pirated material in the commercial product.
Yeah, you can certainly get it to reproduce some pieces (or fragments) of work exactly but definitely not everything. Even a frontier LLM’s weights are far too small to fully memorize most of their training data.
“If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?”
Can I not just ask the trained AI to spit out the text of the book, verbatim?
They aren’t capable of that. This is why you sometimes see people comparing AI to compression, which is a bad faith argument. Depending on the training, AI can make something that is easily recognizable as derivative, but is not identical or even “lossy” identical. But this scenario takes place in a vacuum that doesn’t represent the real world. Unfortunately, we are enslaved by Capitalism, which means the output, which is being sold for-profit, is competing with the very content it was trained upon. This is clearly a violation of basic ethical principles as it actively harms those people whose content was used for training.
Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.
Like, if you want to be an artist and trace Mario in class as you learn, that’s fair use.
If once you are working as an artist someone says “draw me a sexy image of Mario in a calendar shoot” you’d be violating Nintendo’s IP rights and liable for infringement.
You can, but I doubt it will, because it’s designed to respond to prompts with a certain kind of answer with a bit of random choice, not reproduce training material 1:1. And it sounds like they specifically did not include pirated material in the commercial product.
Yeah, you can certainly get it to reproduce some pieces (or fragments) of work exactly but definitely not everything. Even a frontier LLM’s weights are far too small to fully memorize most of their training data.
“If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?”
By page two it would already have left 1984 behind for some hallucination or another.
Oh, so it would be the news?