Anne Laurie asked me about Strongbox, the New Yorker’s new secure system for sources to submit documents, developed by Aaron Swartz. Read the link for more info, but looking through their architecture, my reaction is that it could be secure, and since the system is open source and built on well-known software (Tor, PGP), security geeks will give it the in-depth exam and report that you’d expect.
That said, the other risk of trying to be an anonymous source with digital documents is metadata–the little bits and pieces of identifying information in the digital document that the source is unable to see. A few years ago, one of our area reporters, who’s now a famous national reporter, posted a Microsoft Office document that he had received from a source. I looked at the document details and they included the source’s name. Presumably the New Yorker wouldn’t publish documents that hadn’t been inspected, but there’s always the chance that the original documents get subpoenaed and that law enforcement is able to identify the source from digital fingerprints in the document.
The other big risk is that Strongbox allows the source and the reporter to communicate anonymously, which means that the reporter could figure out who the source is, and later be compelled to compromise the source. It wasn’t Wikileaks that outed Bradley Manning, after all.
Just Some Fuckhead
Maybe you could have emailed her?
Just Some Fuckhead
I’m not suggesting this isn’t wowzer front page stuff, mind you. *eyes glaze over*
liberal
Metadata…just convert to good ole ASCII.
Belafon (formerly anonevent)
@liberal: In other words, all anonymous communication should be done via notepad.
mistermix
@Just Some Fuckhead: In contrast to the content of this post, your two comments are a scintillating diversion.
Chet
The metadata isn’t unaccessible or uneditable – after all, you were able to examine it – it’s just that people carelessly ignore it, and forget that they told Microsoft Word their full name (and initials) the first time they opened it.
The more people realize metadata is a “thing”, that it’s a part of the files they’re sending around, the more they’ll take the necessary steps to sanitize it. I really don’t have any more sympathy for a source who outs himself via .doc metadata than one who outs himself by including a copy of his drivers license in the file. It’s nobody’s responsibility to keep your identity secure but yours.
negative 1
I thought that people had determined that Tor wasn’t as ironclad as previously thought? Any techies want to weigh in on this?
catclub
So be sure to change the metadata to incriminate Dick Cheney.
I looked at the New Yorker article. Many steps could be short circuited by lazy security practice, luckily that never happens.
(Actually, I suspect that the New Yorker will be VERY good on security, since trust is really the only asset they have, relative to someone who wants to give them secrets.)
It seems that essentially it encrypts your documents with the New Yorker’s
public key. Then the New Yorker can access them with their private key.
I guess the TOR part is to make it harder to find the IP address of your uploading computer.
It might be easier to publish the New Yorker public key, then anonymously mail them a usb stick with the encrypted files. Or mail them a keylogger ;)
catclub
@Belafon (formerly anonevent): Did you mean one-time pad? I agree!
NonyNony
@Chet:
Wait – seriously? You seriously “don’t have more sympathy” for someone who is ignorant of an aspect of how a piece of software works than you do for someone who goes out of his way to make a pointedly boneheaded decision to out himself?
That’s so elitist it makes my brain hurt. Essentially what you’re saying is that only people who know everything about a piece of software they’re using should be comfortable being whistleblowers when they find bad things happening. That’s a huge chilling effect.
(You’d think that Strongbox could automatically anonymize the metadata in standard file formats though – in fact I’d be shocked if it didn’t.)
liberal
@Belafon (formerly anonevent):
Actually, for stupid Windoze work I use this thing called “metapad”; for Linux I use emacs.
liberal
@negative 1:
I’m not a security expert, but my recollection is that with Tor there’s still the risk of using traffic analysis to make headway.
scav
@NonyNony: I’m not entirely sure Chet and brethern have “positions” so much as they have “stances”, meaning poses they adopt under dramatically lit corners of the dance floor.
Odie Hugh Manatee
OT:
I just saw Rubio on MSNBC with upChuck Toad and I think I lost a few braincells from the trauma.
Rube: TEA PARTY/IRS!! ‘BENGHAZ!! AP!! Obama is running a 24/7/365 arm-twisting political organization!’
Toad: ‘But aren’t you engaging in fundraising off of these very issues’
Rube: ‘No!… yes, but we are collecting signatures on a petition to…’
Rube came across as a kid ranting and raging about being pwned at his favorite game by Obama. I’ve seen more coherent rants from kids in Steam’s TF2 gaming forum area.
Todd
I still like my documents to reside on my computer, with paid, contracted offsite backups which are under the control of someone I know personally.
I’m paranoid that way.
Soonergrunt
MS Office Document Inspector:
http://office.microsoft.com/en-us/help/remove-hidden-data-and-personal-information-from-office-documents-HA010037593.aspx
Batch Purifier LITE (Metadata removal tool):
http://www.digitalconfidence.com/downloads.html
Just Some Fuckhead
@mistermix: Raoooooow.
liberal
@Soonergrunt:
Hmm…but then you’re relying on someone else. Yuck, esp when that someone else might be Microsoft.
IMHO, easiest thing is to just “save as” *.txt and check w/ your own eyeballs. Or you could scan it, then use OTC.
liberal
@Todd:
Wait?! Someone who doesn’t think TEH CLOUD is the end-all be-all of IT infrastructure? What?
liberal
@Odie Hugh Manatee:
I’m not sure about who’s pwning who. IRS acting head is out. As usual, Dems are pussies.
RSA
The NSA has a nice overview of security risks for PDFs: Hidden Data and Metadata in Adobe PDF Files: Publication Risks and Countermeasures (link to PDF).
Forum Transmitted Disease
Well, a bit of outing myself. Just a little. I work in IT security, and my main emphasis is in digital forensics.
People (and media outlets guaranteeing security to their sources) really need to understand this; given a dedicated examiner with no limit on the budget (pretty much what you find in most police departments and state/federal three-letter agencies) there is no such thing as anonymity. Period. If you made it it can be traced back to you. Metadata is helpful but not required.
I applaud the New Yorker’s effort and the idea but it is not “safe”. Safer, yes, but not safe.
Soonergrunt
@liberal: Sometimes you need better formatting than left-justified block paragraphs.
liberal
@Soonergrunt:
Just dump it to *.txt and then dump back into Word.
liberal
@Forum Transmitted Disease:
You mean, through textual analysis?
ETA: and obviously “who has access?”
Roger Moore
@catclub:
Just be careful about how you send your letter, so you don’t leave any incriminating postmarks. And buy your USB key with cash, or use one you were given as a promotion. And hope the information you’re leaking hasn’t been canary trapped.
Mark B
@Soonergrunt: Exporting to HTML might work, but the HTML generated by Word is just junk, full or extraneous tags and it still contains the metadata. As you said, some kind of anonymizer is the way to go, but you have to trust a third party to construct it correctly. Word docs can also be saved as WordML documents, which contain all of the formatting and are capable of being examined by a text editor. That’s the format I would go with.
Even the binary format can be examined completely, there are tools out there to unpack it and look at every element, but it’s a little arcane. I got into this for another project a few years ago when we had to batch transform some other proprietary documents that used the same storage engine.
‘As people above mentioned, scrubbing the documents isn’t completely foolproof, there are technical methods for finding the source through other means. And the biggest hole is social engineering, there will be some data in the document itself which provides information about the authorship.
Disclaimer: I’m just a regular IT guy, not a security specialist or forensic analyst.
Roger Moore
@liberal:
There is a windows version of EMACS available. I use it for coding.
Mark B
@Mark B: WordML is just XML with a Microsoft Word schema, so any XML editor should be able to handle it. (I wish the edit function worked)
Todd
@liberal:
Hell, I won’t even use QuickBooks online on fear of their system somehow screwing up.
I eagerly await the day when somebody in command of a big chunk of cloud-resident data goes tits up in an ugly way that involves flipping a switch. Or finds a court receiver or bankruptcy trustee at the door with a turnover order for all hardware.
The day that one of those things happens, you’ll see a lot less enthusiasm for cloud-based storage.
liberal
@Roger Moore:
You like Windows, or is it required for whatever work you’re doing?
Also, what type of port? Is it a truly native Windows app? Or is it running on top of something else?
liberal
@Todd:
Not quite the same thing as the cloud issue, but a few years ago a friend of mine with lots of credit card debt (trying to start small businesses) was using some online 3rd party thing that paid his credit card for him. One month it f*cked up. The credit card bank put its thumb on him, and the 3rd party software folks said “not our problem.”
belieber
Wait…did mistermux…the guy who took 3 months to change a crappy theme on wordpress to another crappy one…just start giving internet security opinion?….haha.
liberal
@Mark B:
Good ole octal dump (just joking).
liberal
@belieber:
LOL!
MikeJ
@Soonergrunt:
It’s pretty rare though. I will admit sometimes tables are needed.
Roger Moore
@Todd:
Cloud storage seems like it makes sense as a backup or for distribution, but anyone who trusts a cloud provider as their only storage is crazy. Even if your provider doesn’t go out of business, you’re putting yourself at risk of a network interruption cutting you off from your data.
Roger Moore
@liberal:
I use Windows at work because that’s what our IT department provides and supports decently. Also, most of the other software I use (like instrument control software) is Windows only, so it’s a lot easier to stay on a single platform for small-scale stuff like what I do. The Windows port is by FSF, and AFAIK it’s a native version rather than running on top of Cygwin, X for Windows, or the like.
? Martin
@liberal:
That might address the metadata in the file, but not the metadata in the directory structure. In order for the more advanced search tools to work in modern operating systems, that data is often moved to the system level – exif data for photos, id3 tags for music, etc. so it can be indexed and quickly searched against. That data often gets put back into the file when you perform file operations.
For example, if you download a file from the internet on the Mac, it stores what browser you downloaded it with, what address you downloaded the file from, when, and whether or not you’ve opened it. It does this as part of it’s malware prevention, so the system can check the file for executable code, confirm if it comes from a known bad source, and make sure that you really actually want to run it. None of that data is in the file. Plain ASCII files still have all of that information associated with them. How much of that gets shipped off in your mail client depends in part on your client.
nickgb
@? Martin:
Is that true? I was a programmer a long time ago, and it would have all been handled by system indexes, not file-level entries, but maybe things have changed. It doesn’t seem especially useful to the OS, which would then have to read the files again in order to refresh where they had been, etc., none of which a user would care about.
Okay, now I’m definitely off the fence, I’m unaware of a single mail client out there that will see an ASCII file attachment, ask the operating system for a bunch of metadata, and include that in another attachment. Is there anything really like that out there?
cvstoner
If you really want to be anonymous, print out a copy and mail it. That’s the safest way.
liberal
@Roger Moore:
OK, thanks for the info.
liberal
@cvstoner:
I thought that printers purposefully leave tiny forensic clues so that stuff can be tracked.
RSA
@liberal:
I’m thinking a $35 printer can always be dropped off at Goodwill…
liberal
@nickgb:
I don’t get it, either.
If nothing else, you could copy/paste the *.txt into the email body.
Roger Moore
@liberal:
Not really a practical approach for a large-scale document dump. Nor are most of the more obvious suggestions here, like printing and mailing paper copies. What’s needed is an automated tool for stripping out potentially incriminating metadata in bulk. A good tool that lets somebody do that on a whole document archive will also work for a small scale leaker, but the opposite is not necessarily true.
nickgb
@Roger Moore:
I’ve worked at law firms before where all outgoing attachments were automatically scrubbed of all metadata, so the technology is definitely out there, and I would hope something like Strongbox would be able to scrub all incoming attachments as well. From a journalistic standpoint, maybe that’s problematic in terms of verifying a source, though, that’s simply something outside my expertise.
liberal
@Roger Moore:
Agreed, though I wouldn’t use a tool for this kind of thing unless it’d been vetted by a security expert.
BeezusQ
Check this out – good visual:
http://laughingsquid.com/strongbox-an-anonymous-system-for-submitting-documents-to-the-new-yorker-created-by-aaron-swartz-kevin-poulsen/
Hobbes
@Roger Moore: LaTeX can do that and will provide better formatting than WYSIWYG editors at the same time.
lojasmo
@liberal:
Bush appointee holdover gets sacked=dems are “pussies”.
Okay.