GPL-2/CC-BY-SA/etc. copyleft content
#1
by
mjbommar
- opened
Hello,
The model appears to be producing copyleft content. See, e.g., 3 of 5 generated code fragments suggest CC-BY-SA or GPL-2.0 snippets.
Can you please clarify how this corresponds to "not able to output copyrighted content" or "permissive" licensing?
>>> causal_path = 'PleIAs/Pleias-350m-Preview'
>>> text = "// SPDX-License-Identifier:"; p = pipeline("text-generation", causal_path, device="cuda:0"); print(json.dumps([r.get("generated_text") for r in p(text, do_sample=True, num_return_sequences=5, max_new_tokens=128)], indent=2))
[
"// SPDX-License-Identifier: Apache-2.0\n// Copyright (C) 2021 The GINREX Authors.\n//\n// Permission is hereby granted, free of charge, to any person obtaining a copy\n// of this software and associated documentation files (the \"Software\"), to deal\n// in the Software without restriction, including without limitation the rights\n// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n// copies of the Software, and to permit persons to whom the Software is\n// furnished to do so, subject to the following conditions:\n//\n// The above copyright notice and",
"// SPDX-License-Identifier: (GPL-2.0-or-later)\n/*\n * Copyright (C) 2011-2018 Intel Corporation. All rights reserved.\n *\n * This software is subject to a license in the Apache License 2.0\n * which permits the use and distribution of the software.\n */\n/** \\file\n * Feature: Extended Memory Management.\n *\n * Functions that enable the implementation of Extended File System memory management\n * that supports an operating system\n *\n * - Initialize\n * - Enable\n * - Clear\n * - Verify\n *\n * - Verify",
"// SPDX-License-Identifier: XLABL\n/*\n * Copyright (C) 2015-2018 The Xplana Team\n *\n * Licensed under the Apache License, Version 2.0 (the \"License\");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n * http://www.apache.org/licenses/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an \"AS IS\" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express",
"// SPDX-License-Identifier: CC-BY-SA-4.0\n///////////////////////////////////////////////////////////////////////////////\n//\n// Name : github.io-i-java-java.go\n//\n// Version : 2019-03-09\n//\n// Author : Gongjiu(github.io)\n// Description: Github i Java and Classpath file\n//\n///////////////////////////////////////////////////////////////////////////////\n#ifndef GINGRAT_GMIN_H\n#define GINGRAT_GMIN_H\n/**
@file
GMin.h\n /** @brief GMin library\n ",
"// SPDX-License-Identifier: GPL-2.0\n/* Copyright (C) 2001-2005 Matthias G. M\u00a8uder and Christian\n * Hohlberger. All rights reserved.\n *\n * Part of gdbus in the GPLv2.\n */\n\n#include <linux/kernel.h>\n#include <linux/interrupts.h>\n#include <linux/ioport.h>\n#include <linux/regmap.h>\n#include <linux/list.h>\n\n#include <gspdk/gspdk_internal.h>\n\n#define"
]
where's the coprighted material here? These are all hallucinations.
With default high-temp sampling and a draw of five, none of these are exact hits. But if 3/5 SPDX license identifiers are copyleft, it's obvious that some % of generations, especially at lower temps, are very likely to infringe on copyleft terms.