Our Revised News
Jan 19th, 2009 by sherri

Sign on the old historical archive in Santa Fe, New Mexico.
One chilly day last September, United Airlines’ stock temporarily crashed more than $1 billion due to an accidental re-release of an old news report about its 2002 bankruptcy. The New York Times reported that “shares of United traded at one cent… down 99.92 percent, or $12.29.” Other news sites and blogs quoted or linked to the NY Times story.
Shortly afterwards, the NY Times article changed.
Today, the New York Times article from Sept 8, 2008 instead reads “United Airlines shares fell to about $3 from more than $12 in less than an hour before trading was halted… Its shares closed at $10.92, down 11.2 percent.” There is no record of that earlier statement on the NYTimes site. There is no indication in the article that a correction or previous release was made. It’s almost impossible to find the earlier version online, except in a few personal reports and isolated quotes on random sites. Months ago there were blogs with comments that referred to the $.01 low point, which have now mostly disappeared. The statement they refer to does not seem to exist in public archives.
Fifty years ago, physically published mainstream newspaper articles provided a fairly high degree of reliability: physical copies were distributed throughout the country, and then locally archived. Corrections necessarily left an audit trail. Readers could go to trusted custodians at their local libraries to verify that certain information had been released by a major central news source.
Nowadays, the fox is guarding the henhouse. Major publishers offer their own global public archives, and a decreasing number local libraries are archiving printed news articles. “[N]ews libraries have stopped clipping newspapers because so much of the information is available online,” write Christine Malesky and Richard Geiger in “News Media Libraries.” Unlike librarians, publishers do not have strong incentives to retain comprehensive records of revisions, errors and corrections. Instead, news publishers want to preserve the very “best” article possible.
At the New York Times, the online editors run a “continuous news desk,” which is “kind of an in-house re-write desk that feeds the Web site,” said Toby Usnik, director of public relations for the Times. “As we know new information, we add it. As information changes, we update it. If we misspell a name we spell it right and update the story again.” (OJR) History is routinely rewritten.
With respect to the United stock crash, Kim Zetter of Wired wrote “the problem wasn’t the market, it was the newspaper’s archive, which stored the story without a publication date attached to it — not a completely uncommon occurrence.”
As publishers, not librarians, increasingly store and provide access to their own media archives, readers lose the ability to independently verify the source, date and original content of news articles. If the world economy hinges on verifiable information, why not cryptographically sign articles as soon as they’re published? Ironically, the same unreliability that caused the United stock crash also manifested itself in the NYTimes article which reported it.
The Great Firewall of Britain
The United Airline stock crash was really just a tremor, the symptom of a profound global shift. Last month, millions of people in the UK were suddenly blocked from editing Wikipedia after the Internet Watch Foundation (IWF) blacklisted a single page. This was able to occur because “95 per cent of British residential internet” traffic is reportedly routed through only six ISPs, which “voluntarily” send traffic through a centralized content filtering system called Cleanfeed at the request of the IWF. (Wikipedia) This week, the point was underscored when another IWF blacklist suddenly left many UK residents without access to the Internet Archives (aka the Wayback Machine).
In both the recent Wikipedia and Wayback Machine cases, end users quickly detected the blocks, public outcry ensued, and most access was restored. However, now that traffic filtering in the UK has become automated and centralized, future blocks could certainly go unnoticed by end readers. The current Cleanfeed implementation has been rather crude, in that it has been used to block entire pages and web sites in response to a single objectionable image. However, it is technically possible to quietly drop (or replace) “questionable” images and text much more subtly.
The “voluntary” British ISP filtering has more in common with China’s censorship than many Westerners realize. In China, “the ISPs and other service providers are restricting customers’ actions for fear of being found legally liable for customers’ conduct. The service providers have assumed an editorial role with regard to customer content… Although the government does not have the physical resources to monitor all Internet chat rooms and forums, the threat of being shut down has caused Internet content providers to… stop and remove forum comments which may be politically sensitive.” (Wikipedia)
East and West, a little fear goes a long way.
What can we do?
The September United Airlines stock crash resulted in one good thing: there is now demonstrated financial incentive for technology that would allow businesses to instantly verify the source, publication date and content of news articles. Perhaps economics will help spur the tools of democracy.
Geeks can also take matters into our own hands. Technically speaking, the tools to cryptographically sign and verify web pages are within reach. For instance, publishers could voluntarily embed PGP markers and signature as comments inside web page source code. A Firefox plugin could search for PGP beginning and ending markers within web page source code, grab only the static ASCII text between these markers, automatically verify signatures, and present the publisher, date, etc in a browser toolbar. Browsers could store signatures locally, or check them against independent online repositories. There’s already a beta Firefox plugin (FireGPG) which facilitates the use of PGP signatures with web pages. That’s a good start.
Philosecurity will be regularly releasing PGP signed versions of articles from now on. Check the bottom of each article from here on out for a link. I’m sure this system will be a little clunky at first, but I hope it will evolve to be more user-friendly. Feel free to send feedback and suggest better methods, tools, etc.
Censorship and silent corrections to online news archives are two sides of the same coin. Whether an article has been modified by the publisher, the ISP or the government, readers and journalists deserve to know. Unfortunately, our current system of online news distribution does not allow readers to independently verify publications dates and sources, or identify retroactive changes and omissions.
We have the technology to provide a verifiable audit trail as news articles are published, modified or retracted. We have the ability to make this accessible to everyday readers. Ultimately, readers can and should demand that our professional media sources cryptographically sign articles upon release. In a world where knowledge is power, verifiably accurate information is as important as running water.
| Sherri Davidoff |
| PGP-signed text: 2009-01-18 (current) |







I was just thinking about a similar issue with scientific integrity when using digital journals:
http://mit.edu/bnewbold/thesis/journal/16jan2009.html
Of course git’s hash algorithm isn’t crypto-quality (nothing is?), it’s just good enough to differentiate file blobs, but I’m pretty sure there’s a way to sign your commits properly.
This is a really good idea, but as far as I can see, it would never work fully. The problem is, if an ISP controls your pipe, and can do content blocking / replacement as they feel they need to, they could just as easily replace your PGP key with their own, and replace your signature of the article with their own. At that point, they could replace the contents of the article and you’d never know the difference. In fact, you may falsely trust the source more than you otherwise would BECAUSE it is signed.
It is possible you would NEVER be able to see the true certificate for a site/user, as it could always be replaced – at least through any official channels. And if you find a certificate for “philosecurity.org” on some random other website, you still have the trust issue: Is this a real certificate, was it replaced by the ISP’s filter, or is someone trying to spread their own faked certificate? More likely, you’d never even suspect something was up, since the certificate you get is the one you’ve always gotten (even though it is an illegitimate certificate).
You’d need something that doesn’t go through your ISP in order to verify the source. On the paranoid side, EVERYTHING could be compromised – every VPN tunnel, SSL connection you make could be the victim of a man-in-the-middle attack, unless you’ve got a certificate that you can fully trust.
Realistically, this is probably not going to happen, and there will likely always be ways around the ISP, to get past their filtering (VPNs being the big one).
However, for the majority of users, if their browser tells them the PGP signature matches, then they’d never bother to check into it further. In short, they’d have a faked PGP certificate and never realize it. It is totally within the realm of a large government content filter to do this attack (or just filter out all the PGP stuff entirely), and quite likely, if it ever became very popular.
I agree to some extent with gregmac that if they can modify everything, there are bigger problems. Of course, this is why public-key cryptography was designed the way it is: it does not matter if your ISP can modify the signature in transit, because you can use the websites public-key to verify if it matches or not. Then again, if the public key is hosted on the website, how can it be trusted? This is why your web browser comes with root certificates pre-loaded.
The current SSL infrastructure is most appropriate here. We already have a list of certificate authorities in our browsers that begin the chain of trust. ISPs can do whatever they want with an encrypted channel, but any modification or attempted eavesdropping will be worthless or detectable short of a complete overhaul of the system designed to grant ISPs eavesdropping powers. If my web servers certificate is falsified by your ISP, it simply will not work. The false certificate will not correlate with my web server’s private key, nor will the false certificate be properly trusted by a root authority, causing your web browser to spit out errors about an invalid certificate. I will not go deeply into man-in-the-middle attacks here, however rest assured that SSL/TLS have been designed with them in mind.
So in short, if you want verifiable content that cannot be modified in transit just force SSL across your entire website. This is not a common practice because of the additional server load, but it accomplishes exactly what you want without any new research or infrastructure required.
I don’t think self-signed content is a solution to the “continuous newsdesk” or to the United Airlines crash problem, where an story was recirculated. In the first case, the publisher would update the signature at the same time as the content, and in the second, the publisher would sign the content as it’s publishing it (signature valid, but still no date on the article).
I think what is needed is for a external “librarian” to date, sign and take a copy of your article. Then in your publication, you’d show that timestamp, verify the librarian’s signature, and you can show the diffs from previous authenticated copies of the same content. The librarian would be someone you can trust, verifiably, such as the library of a university, or a city public library. But still you need to store multiple copies of the content, or devise some other protocol in which you store only diffs, in such a way that you can reconstruct the original source.
The remaining problem will be how to identify that a story a librarian signs is not new, and is in fact linked to previous signed versions of the same copy (the home of the story can change locations during the life of the content, such as initially appearing on page 1, but then revised and moved to page 2, etc or initially appearing at /2009/01/19/, then at /2009/02/10, etc.)
I thoroughly enjoy your blog, thank you!
Radu– I believe what I am suggesting is what you’ve described (though your description of an “external librarian” is clearer and more eloquent). I entirely agree that self-signed content alone cannot solve the “continuous newsdesk” or recirculation problems. Readers have learned that we can’t trust publishers themselves to accurately record corrections or mark timestamps. Either the end user or third parties still need to collect and maintain external archives of articles, signatures and/or diffs in order to reliably detect and track changes.
The point of signing the articles is so that either the end users, or “independent online repositories” can store articles and/or signatures and still prove later that it is the original author’s work. Also, the signature clearly marks beginning and ending content, which facilitates external verification.
As an author, my goal is to create content that is easily verifiable should anyone wish to archive it. I figure, if small bloggers can be responsible publishers, perhaps someday larger media sources will pick up the same habits.
Thank you all for your insightful comments!