@DerHans

DerHans@lemmy.world · 1 day ago

Good Sire, if we are talking about only the US, then that does not matter at all. Existing copyright law and established precedents (without involving AI) already covers this. The copyright of software is handled like that of literature, so the actual content is copyrighted. More specifically the sequence of words. In order to violate the copyright of a protected work, one just has to reproduce this sequence. It is not relevant, if it was reproduced by an AI, a human, God or your cat (:D). The only exclusion to this is fair use. Whether fair use applies must be considered by a case by case basis. There are four factors that are used in deciding whether it falls under fair use. And that is considering that portions of that code are not patented. If they are, then you are screwed no matter what (unless you are allowed to use that code).

Anyhow, you are opening yourself up for litigations for sure.

Now, is this a problem? Probably not. Copyright infringement is actually very very hard to spot, especially without automated tools (looking right at you, YouTube). Even if it is spotted, the owners of the copyright must use resources in order to enforce it. Considering that most of the code used in the training data is open-source, most of these owners won’t have these resources or at least aren’t using them (which is sad, because that also applies to the infringement of companies as well). You cannot lose, if no one sues. Whether you should risk it, is anyone’s decision to make.

For unprotected code… I guess, you are right. It could be one way or the other, but it does not really matter that much. At worst, people can use your code without adhering to your license. That would not mark the end of an project, the former definitely would.

Also on another note: Using copyrighted material in the training data of AI is considered fair use.