Related ToolsClaudeCursorClaude Code

Python's Chardet Library Replaced With Claude-Generated Code, Relicensed to MIT

Claude by Anthropic
Image: Anthropic

What Happened

The maintainers of chardet, a widely-used Python character encoding detection library, released version 7 on March 6 claiming it was "a ground-up, MIT-licensed rewrite" with a 43x speedup over version 6. The problem: they created it by feeding the existing copyrighted codebase and test suite through Anthropic's Claude, then relicensed the output from LGPL to MIT.

Previous versions of chardet were licensed under the GNU Lesser General Public License (LGPL), which requires derivative works to maintain certain copyleft obligations. The maintainers justified the MIT relicensing by citing the Oracle v. Google Supreme Court decision, which found that API cloning constitutes fair use.

The community response, concentrated on GitHub issue #327, has been sharply critical. The core objection: this was not a clean-room implementation. Claude was trained on publicly available code, including chardet itself. Running copyrighted LGPL code through an LLM and calling the output a "rewrite" is not the same as independently reimplementing functionality from a specification.

Why It Matters

Chardet has over 200 million monthly downloads on PyPI. It's a dependency in requests, urllib3, and hundreds of other packages. A license change at this level of the dependency tree affects a massive number of projects.

If you're a developer whose project depends on chardet under LGPL terms, version 7 changes the legal foundation of that dependency. LGPL allowed use in proprietary software with certain conditions. MIT is more permissive. That sounds good on the surface, but the question is whether the MIT license is even valid if the code is a derivative work of LGPL-licensed material.

For anyone using AI coding tools like Claude, Cursor, or Claude Code in professional work, this case will become a reference point. The legal status of LLM-generated code remains unsettled. Courts have indicated AI output may lack the human authorship needed for copyright protection. If the LLM output can't be copyrighted, it arguably can't be relicensed either.

Our Take

This sets a terrible precedent if it stands. The maintainers essentially used an LLM as a license-laundering machine. Take LGPL code, run it through Claude, call it MIT. If this approach is valid, any open-source project with copyleft protections can be strip-mined the same way.

The Oracle v. Google defense doesn't hold up here. That case was about reimplementing APIs - the functional interface, not the underlying implementation. Feeding an entire codebase through an LLM and accepting the output is closer to automated translation than clean-room reverse engineering.

The 43x speedup claim is interesting but beside the point. If Claude genuinely produced faster code, that's a legitimate technical achievement. But you can't separate the licensing question from the method. A proper approach would have been to write a specification based on the chardet behavior, then have Claude implement from that spec without seeing the original code.

For developers: pin your chardet dependency to version 6 until this gets resolved. For anyone using LLMs to write code professionally, document your process carefully. The legal landscape here is shifting, and "Claude wrote it" is not a licensing strategy.