Textbook publisher John Wiley & Sons is not exactly a stranger to controversy, having been on the losing side in the precedent-setting Wiley v. Kirtsaeng Supreme Court case that determined importing inexpensive foreign editions of textbooks is protected under the First Sale doctrine. Now Wiley has stumbled into another controversy.
Nature reports that the publisher has come under fire for using “trap URLs” to fictitious papers on its web site to fight unauthorized bulk downloads of its content, just as cartographers used to use “trap streets” to protect their maps from being copied by others. Wiley decided that anyone who accessed these fake articles couldn’t have been sent there by any legitimate source, so must be a bulk-downloading bot, so it revoked their access to its legitimate content.
This practice was discovered by researcher Richard Smith-Unna, who was running a data mining project to compare how easily data could be extracted from different formats of scholarly papers. The project used digital object identifiers (DOIs) obtained from online sources.
On May 29, Smith-Unna posted a Google Doc that included a list of more than 150 trap URLs hosted by Wiley. “Never suspecting anyone would be so arrogant as to pollute a scientific corpus with fake data, I attempted to use these DOIs for a legitimate academic mining project in good faith (one which I had pre-informed the library about),” he wrote in the Google Doc.
Eric Hellman of the Free Ebook Foundation added that Wiley’s trap URL method was actually rather clumsy and unsophisticated. Further, since it was possible to regain access by switching browsers, it appears Wiley simply used browser cookies to block the access, so anyone who cleared their browser cookies could get it back again.
Wiley claims to have ceased blocking access to people who clicked on the “trap URLs,” though there’s some dispute about whether it actually has. But it’s not the only academic publisher to have used trap URLs, and there’s some concern that the practice might escalate into an “evolutionary arms race between legacy publishers and researchers.”
With the shift to digital, it’s become a lot easier to download, copy, and disseminate copyrighted material, including academic studies. (The game of whack-a-mole Elsevier has been playing with the operator of Sci-Hub is proof of that.) Naturally, publishers are going to be eager to prevent that kind of thing from happening. However, it’s important that whatever method publishers use for that should not impede the efforts of researchers looking to make legitimate use of that data. So far, Wiley’s method fails that criterion.