A shadow library known for archiving books and academic papers has announced a massive new project targeting music. The group, Anna’s Archive, claims to have scraped the metadata for nearly the entire Spotify catalog, amassing data on approximately 256 million tracks. This effort is part of what the group describes as a mission to preserve cultural artifacts, arguing that current music collections are skewed toward popular artists or use impractically large file sizes. According to a blog post from the group, they discovered a method to scrape Spotify at a large scale. The scraped data covers over 15 million artists and 58 million albums. The group states it has archived around 86 million actual song files so far, which they estimate represents about 99.6 percent of all listens on the platform. The total size of the full dataset, including files yet to be archived, is close to 300 terabytes. Anna’s Archive, which operates as an open-source search engine, typically focuses on text-based materials, citing their high information density. However, the group asserts that its goal of preserving humanity’s knowledge and culture extends to all media types. They intend to make the music files available for download in stages, ordered by popularity, for anyone with sufficient storage space. The legality of the operation is unquestionably problematic. Scraping, storing, and distributing copyrighted music files without authorization is a clear violation of intellectual property laws. In response to the incident, a Spotify spokesperson stated that the company identified and disabled the user accounts involved in the scraping activity. Spotify has implemented new safeguards against such attacks and is monitoring for suspicious behavior. The company emphasized its stance against piracy and its work with industry partners to protect creators’ rights. The group behind Anna’s Archive contends that its new collection is the largest publicly available music metadata database. They see the Spotify scrape as a foundational step toward a comprehensive preservation archive for music, acknowledging that while Spotify does not contain every song, it provides a significant starting point. The archive still has millions of files left to process, with the current 86 million songs representing only about 37 percent of the total identified tracks. This move highlights ongoing tensions between open-access preservation movements and the digital rights management frameworks of major content platforms. It also raises complex questions about cultural heritage, accessibility, and copyright enforcement in the digital age. As the group prepares to release the data, the music industry and digital rights observers are likely to watch closely.


