This is a sophisticated audience, so I’ve no doubt folks here grasp how intrusive (i.e. revealing) metadata can be. But even those fully up on network analysis and related crafts may find this from Kieren Healy amusing — and useful in explaining why this stuff does matter to your friends and family who may be in the “if they’re not listening in, I don’t care” crowd:
London, 1772.
I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relational manner.
The analysis in this report is based on information gathered by our field agent Mr David Hackett Fischer and published in an Appendix to his lengthy report to the government. As you may be aware, Mr Fischer is an expert and respected field Agent with a broad and deep knowledge of the colonies. I, on the other hand, have made my way from Ireland with just a little quantitative training—I placed several hundred rungs below the Senior Wrangler during my time at Cambridge—and I am presently employed as a junior analytical scribe at ye olde National Security Administration. Sorry, I mean the Royal Security Administration. And I should emphasize again that I know nothing of current affairs in the colonies. However, our current Eighteenth Century beta of PRISM has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.
Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not. Surely this is but a small encroachment on the freedom of the Crown’s subjects. I have been asked, on the basis of this poor information, to present some names for our field agents in the Colonies to work with. It seems an unlikely task.
So what did our humble toiler in the fields find?
…Mr Revere—along with Messrs Urann, Proctor, and Barber—appears towards the top or our list.
So, there you have it. From a table of membership in different groups we have gotten a picture of a kind of social network between individuals, a sense of the degree of connection between organizations, and some strong hints of who the key players are in this world. And all this—all of it!—from the merest sliver of metadata about a single modality of relationship between people…
I admit that, in addition to the possibilities for finding something interesting, there may also be the prospect of discovering suggestive but ultimately incorrect or misleading patterns. But I feel this problem would surely be greatly ameliorated by more and better metadata. At the present time, alas, the technology required to automatically collect the required information is beyond our capacity. But I say again, if a mere scribe such as I—one who knows nearly nothing—can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.
Much more good stuff at the link, showing the steps of a simple network analysis (and offering further links to the underlying data, if anyone wants to play with the idea a bit themselves. Also, Healy pointed to this paper by Shin-Kap Han (PDF), which performs a similar analysis on the roles of Paul Revere and Joseph Warren in much greater depth.
Image: Grant Wood, The Midnight Ride of Paul Revere, 1931
Corner Stone
Never knew you were such a talented comedian, Tom.
Corner Stone
And Malcolm Gladwell got there first, btw.
Tom Levenson
@Corner Stone: But not here.
Schlemizel
Here is the problem for me: either you believe the government and the companies involved or you don’t. If you believe then this program only focuses on foreign national and captures Americans coincidentally. If you do nt believe then tell me how we will ever stop them. If everyone involved is lying how do you know they would stop if they said they had? If they are liars do you believe a president Ayn Paul w/could make them stop? Why & how would you verify it?
Yes, we need to put an end to the post 9/11 madness but explain to me what will make you believe we have?
srv
Left your cell phone on at a protest?
I believe if Rand were to ask this question “Mr. Director, do you have cellular meta data for April 15, 2009 stored anywhere?”
The answer of course is yes, and that means a simple query can list every Freedom Loving American protesting against that Kenyan Usurper.
Call your Congressman.
Suffern ACE
I guess I’d feel better about this if I could call on the NSA as kind of a taxpayer funded tech support from time to time. Maybe they could tell me when I’m low on toner. Send me a text when I’m at the grocery store to remind me that I could use a few more yoghurts since the ones that got pushed to the back of the fridge are about to expire. In exchange for that, they can read the subject lines of my sent items once a week. A new social compact.
Baud
Shit. I just realized that my metadata connects me to all of you!
I’m fucked.
Corner Stone
@Tom Levenson: Just think if DNI Clapper could’ve targeted Revere.
That sounds like the basis for some hot slash/fic if you ask me.
? Martin
I’ve seen a few of these now and there’s just one problem with them all. They all make their case by having identifying data. There is no identifying data in the metadata even as reported by Snowden. And without the identifying data, a lot of the conclusions being drawn could not be drawn. Instead you would have:
“Individual #1 —along with Individual #2, Individual #3, and Individual #4—appears towards the top or our list.” From that, they wouldn’t be able to tell a group of traitors from a sewing circle, it’s just a group of connected people. Only once you had probable cause (from some other intel mechanism) could you get the identifying information for one of them, and then draw the group conclusion, and then request the identifying information for the others.
Todd
Now that Tebow is a Patriot, how will Boston react?
scav
Make no claims to sophisticatication, but I can boast of a “Don’t duck the meta-data” rubber duckie in my bathroom. Which also absolves me of sophistication, if it comes to that. Thanks Tom. Gotta love networks, relationships, and the study therof. One of the weirder ends of anlytical geography.
John Cole
Not bad, Tom. Two minutes. Getting better.
Tom Levenson
@? Martin: If you have numbers from which calls originate and to which they go, you’re some distance down the way to identification.
Tom Levenson
@John Cole: Sorry. When I last checked before posting there was nothing but Matoko-Chan. Had to pull the plug and switch coffee shops, hit publish, and there it was.
Happy to pull this down for an hour….
Cassidy
@John Cole: Should we ask AL her opinion on that subject?
becca
@Schlemizel: Bingo.
Data is a hot commodity, too. There’s piles of money to be made. Lots and lots of money.
joes527
@srv:
If that is the case, look for significant updated to the RNC mailing lists shortly after the next time a Republican President takes office
? Martin
@Tom Levenson:
You have the keys to the identification, but not the identification. And to the court, that matters. And to each of us that matters. It matters because very little other intel they collect uses the same keys. They can’t correlate my calling pattern with my my postal mail pattern, or my internet usage, or my credit history, or anything from my car, and so on. The master key to actually spying on me is my identity, and they don’t have that, and they can’t get that without a specific subpoena for my individual phone number. Then, once they have my name, they can tie all of that stuff up.
In the calling data, the phone numbers on each end are just random numbers. We know that number 1234567 called 7654321. So what? What does that reveal about any one of us. You need to slap an identity on at least one of those endpoints for it to even begin revealing anything. And you need to slap an identity on both endpoints before it really becomes invasive. That’s two more subpoenas, and not blanket ones.
scav
@Cassidy: Given this is an obsessive thread and her last one up was deliberately a non obsessive one (as is John’s), seems to have worked out. Bolt holes and sanitary cordons are currently a boon.
max
and useful in explaining why this stuff does matter to your friends and family who may be in the “if they’re not listening in, I don’t care” crowd:
Ah, but the metadata is how they decide to whom they should listen. (And anybody who buys that crapola about them not listening in is welcome to email me for leads on a large bridge situated in some Florida swampland. Cheap at a thousand times the price!)
max
[‘Denial is not a river in Egypt. But it IS a Sunday morning public affairs show!’]
Tissue Thin Pseudonym (JMN)
@Tom Levenson: As I understand it, though, the information they get doesn’t include the phone numbers. It contains a unique identifier for the phone but it’s a different number that is not directly connected to the owner the way a phone number is. To get the actual number they need a specific subpoena.
joes527
@? Martin: But if the NSA just had access to some super secret way to get from phone numbers to names, then we’d be screwed.
Baud
@joes527:
That’s what I thought also, but see # 21 just above you.
Corner Stone
@? Martin:
I don’t think that’s true. If I know Item X has made certain calls at certain times from certain locations, I already have a big chunk of you to work with. And I don’t even have to be looking for Item X, specifically. The things I’m looking for have filtered Item X onto the list.
Not to get too deep into it, but if you start plotting this data out on a simple whiteboard some things start to happen. Beyond what data is revealed, the gaps of data can be just as revealing.
? Martin
@max:
It’s part of it, but not it. It can’t be it, there’s not enough information there. How it works is say you’ve got an AQ phone number in Pakistan (obtained however, via subpoena, foreign groundwork, whatever). You see someone in the US call that number. You don’t have their identity. You’re probably not going to get that identity off of one phone call, either. That subpoena wouldn’t be issued. Could be a wrong number, whatever. You see that same individual call another individual, and that individual call a 3rd individual. That 3rd individual now calls the same AQ phone number. Ah, not it’s not likely a wrong number. Now you have a connection between two people, through an intermediary, to a known terrorist number. Now you can get a subpoena.
Without knowing the phone in Pakistan is an AQ phone, no subpoenas would have been issued. Could have been two family members calling their grandmother. The metadata isn’t enough on it’s own to do anything. You need hard intel at one end of the chain for it to make any sense at all.
scav
@? Martin: First part of the number are (increasingly less for cells) geographic, so there is that handiness as a filter. Same goes for ZIP codes, which people hand out easily. There are a lot of these partial keys to identity (email, physical addresses, face book blah blah blah). I’d hope that phone numbers and physical addresses are recognized as being was too close to individual and thus supressed, but Zips cerainly are judged anonymouse enough to get through.
Still, people are often sloppy about these partial keys to identity and they’re ubiquitous. Find a source where two or more partial keys occur together (like a Rosetta stone) and one can start zippering together sources and gain in resolution. Finding these Zipper sources, ways to relate and integrate disparate raw sources is a large part of the battle too.
different-church-lady
And that’s pretty much why Newman and Pulling used encryption.
? Martin
@joes527: But it’s not phone numbers. It’s account numbers.
? Martin
@Corner Stone: Not really. Not multiplied by 150 million users. It just all degrades into noise. The gaps would only make sense if you knew something about the people at each end. You might not need their identity, but you’d need some other identifying information – their occupation, their age, their nationality, something. You have none of that. All you have is call from A to B. That’s it.
Truly, they get a lot more information by scanning your mail, which they can legally do. At least that has your name on it, and the name and address of your recipient. And they’ve been doing that for half a century.
Superking
There are a few clear fallacies in the argument, though.
First, it is true that analyzing data can give you more interesting data. So what? In the article, the author jumps from interesting data to calling Paul Revere a terrorist and claims she does that without reference to the content of his communications. The whole article begs the question by beginning with groups of people that were involved in revolutionary activity. OF COURSE analyzing this data will drill down to Paul Revere or Samuel Adams or Benjamin Franklin, because we already know who is in the data set and what they did.
Second, the British knew in 1774 that Paul Revere favored revolution. It wasn’t a big secret in any way. In fact, on the night of his ride, the British troops around Boston had been order to arrest him. Big fucking deal.
Tom Levenson
@? Martin: Bowing to greater knowledge I’ll retreat to the original thought behind this post: there is much more information in metadata than the reassurances indicate. The power of making a small number of simple associations is surprising to many. I’ve seen examples similar to the one Healy described for other historical situations (with some intent to apply that knowledge to present day problems, btw).
Most broadly: our current popular understanding of, and more important, legal framework for surveillance has not kept up with the growth of network/social data, the sophistication of analyses now available, and the enormous advances in computing.
MikeJake
Maybe I’m just being naive, but if the shocking import of these NSA revelations requires me to assume that the government is, in actuality, using Prism to suck up all of our emails, voice chats, search histories, etc., without judicial oversight and regardless of whether American citizens are the targets, then I’m going to need more than the word of this Snowden guy and Glenn Greenwald. For one thing, Greenwald tends to write more like an advocate than a journalist. Which makes me feel kinda awkward, because I’ve generally been in agreement that journalists shouldn’t make a fetish of achieving the appearance of neutrality, yet I can’t help but feel that some neutrality is called for here. He’s really only given Snowden’s side of this. That suggests to me that Greenwald has presumed that the government and the telecom companies involved will conspire to lie about what they’re doing, so no point in getting their side. That’s a muckraking approach.
? Martin
@scav: But as quickly as we’re gaining the ability to pull this data together, we’re making it harder to actually do it. We have overlapping area codes. We have roaming. We have VOIP numbers. We use Dropbox and other peer-to-peer ways to transmit information. If you can tie all of those mechanisms back to me (here’s the identity piece showing up again), then sure, it’s easy. If you can’t – and the courts make it difficult to do that – then it’s just more and more noise.
We think it’s easier than it is because looking out, we know that number is us, that zip code is us, that IP address or account name or BJ handle is us because we naturally associate all of those things with our identity. But looking in, it’s really hard to put those together as one person. It takes a lot of subpoenas to a lot of different groups to do that. And there are a lot of people in this country to do that too. And there aren’t that many judges to do it on the scale the agencies might prefer, so they’ve got to be relatively judicious with what they go after.
Omnes Omnibus
@Tom Levenson:
That’s really the big thing here. The law on this seems to be stuck in 1979. There has to be a way that modern technology and privacy protection can coexist. Unfortunately, we have a Supreme Court with a conservative majority and a Congress that refuses to govern (this last may be a blessing in disguise since anything that would come out of the House probably would be worse than the current mess).
Baud
@Tom Levenson: @Omnes Omnibus:
This and this.
Tissue Thin Pseudonym (JMN)
Though as I read more I may be wrong about the phone number not being included in the metadata. I’m seeing conflicting answers on that.
the Conster
So would Ben Franklin tweet or post on FB his adage about security and freedom?
scav
@? Martin: a) I mentioned the problem with phone numbers being increasingly less geographically specific, even at the area code level (the central office codes were the first to go). Still, for most account numbers, we can reverse engineer a probable geographic area because generally, most calls are local. (Finding phones that don’t match that criteria might actually be information initself). point was the metadata provides filters, sieves toward identity, and if one can zipper together enough sieves (both geographically specific and otherwise), and data becomes less and less anonomized. Yes, it takes work, but it’s an arms race and unclear which side is ahead.
Baud
@Tissue Thin Pseudonym (JMN):
I trusted you.
? Martin
@Tom Levenson:
No question about that.
But regarding the former, consider how someone would connect your phone account number to, say, your Facebook ID (two metadata keys). Other than your name (which might be part of the latter), what identifying information do those two things have in common? There’s probably absolutely nothing. And even with your name, you’d be guessing a fair bit as I’m sure you aren’t the only Tom Levenson with both a phone and on Facebook. You might then need to look at your Facebook location (which you generously provided) and correlate that with your area code/prefix. That’s a lot of work. And it’s at least one individual subpoena, so they had to establish probable cause just to get that far.
pacem appellant
At least two bloggers here leave the metadata still inside their photos for all the work to see. Based on that, I now know where they live. With this information, I will unleash my minions of aerial spies to see what the pets do while their masters are away. Or not. I’ll just smugly sit on the data and smile knowingly to myself.
The Other Chuck
@MikeJake:
And given some of Snowden’s other claims, he’s looking like something of an unreliable source, to put it mildly. Now it’s true that he might have broken the story (the intelligence community’s own hysterical reactions didn’t help them) but I’d treat with deep deep skepticism any claims he has to privileged informations other than a likely misclassified powerpoint slide deck that fell into his hands.
Omnes Omnibus
@the Conster: Wasn’t he given a timeout?
Mnemosyne
Whenever I hear about the government collecting massive amounts of data from every source they can get, I’m reminded that the US had intercepted communications from the Japanese planning the attack on Pearl Harbor prior to the attack, but didn’t get around to translating them until a year later because they were drowning in so much data.
Having as much data as possible gives a comforting illusion of safety, but it can actually be worse than not enough data.
Yatsuno
@Omnes Omnibus: We had one emergence from the mists of time today. I’d choose to not have that be repeated.
Violet
@Mnemosyne: “Bin Laden Determined to Strike in U.S.” was the title of a memo to the President. Data was there. Data was analyzed correctly. Data wasn’t used to best effect.
Even when you have all the data, analyze it correctly, someone still has to DO something with it or else it’s not really very helpful.
Sadie
I see that you didn’t note an explicit exception for the commenters (and there’s more than a couple) who scream at “firebaggers” and “emo progs” for being the culprits for everything from the rise of right-wing populism and Obama’s supposedly lagging poll numbers to the spread of the bubonic plague in thirteenth-century Europe, but I’ll give you the benefit of the doubt and assume that it’s implied.
Narcissus
Have we found out yet if PRISM actually does what the PowerPoint slides say it does?
mclaren
@? Martin:
As usual, you’re ignorant and dead wrong. You must be a manager: only a supervisor or manager or CEO could manage to be so badly wrong and so grossly ignorant.
You identify people and track them by correlating the metadata with all the other data you’re getting. You build graphs of events on the individual cell towers and correlate those with other graphs of things like call duration. You’re too stupid to recognize this, but by sieving through a large database of cellphone calls, you can easily tell who’s talking to whom by picking out the ones with identical call lengths.
Look, people like Martin are so dumb they don’t realize that large stores like Wal-Mart can tell which shoppers are pregant by looking at the metadata of their shopping habits.
“How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did,” Forbes, 16 February 2012.
Cellphone metadata gives you even more information to work with than the record of what you bought while shopping.
Corner Stone
@The Other Chuck: The WaPo said they’ve published only 10% or so of what Snowden presented due to security concerns.
But I’m sure they probably misclassified all of that material as well.
Sadie
I see that you didn’t note an explicit exception for the commenters (and there’s more than a couple) who scream at “firebaggers” and “emo progs” for being the culprits for everything from the rise of right-wing populism and Obama’s supposedly lagging poll numbers to the spread of the bubonic plague in thirteenth-century Europe, but I’ll give you the benefit of the doubt and assume that it’s implied.
Corner Stone
@Mnemosyne: Yeah, I generally conflate technological abilities from the early 1940’s with where we are now as well.
It’s an easy mistake to make.
Bill Arnold
@Tissue Thin Pseudonym (JMN):
I keep seeing these assertions that phone call metadata is limited to endpoint ids, maybe call time/duration. Maybe for certain intelligence programs it is. However, metadata is a broad and squishy term, and can include a lot more. In the way I use it in my day job, it can include very detailed descriptive information about the data being described, and can include derived descriptive information. (Derived by humans or automation or both.) Mapped to phone calls, this usage could include e.g. a machine generated textual summary of the phone call, like “routine conversation between spouses 95%”, or “unclassified conversation between two males speaking Arabic”.
Not saying that’s what’s going on, just that “metadata” is a very broad term.
Corner Stone
@mclaren: I’m not really sure why Martin is fighting so hard against the certainty that I could find out every thing I wanted to about him from metadata.
Oh, wait, yeah I do kind of understand why he’s doing that.
Nevermind.
Cacti
Inserting modern anachronisms into historical events that happened 3 centuries ago? Sounds like fun.
Imagine if Louis XVI could have called in airstrikes. I bet the Bourbons would still be running things in France.
SatanicPanic
@mclaren: Are you clear on the difference between metadata and plain old data? Because it doesn’t really sound like you are
Omnes Omnibus
@Cacti: Not if Robespierre had a Death Star.
Charlieford
James Bamford was just on the Newshour insisting we need to have an open debate! open debate! open debate!
I’ve been mulling that for awhile and something was congealing when Bamford himself ended one of his perorations by comparing us to East Germany and it was like “Eureka! That’s it!”
We can’t have a national debate on this because frankly the American people are too idiotic and conspiracy obsessed and low-information and short-attention-span and fantasy-headed and all the rest.
I mean, think about it. Anyone recall the ACA debate? Health insurance! What could be more wonkish and boring? And what did we get? Hitler mustaches and screams of communism and take-overs and “death panels” and . . .
Look. No. We can’t have a national debate on anything relating to intelligence. It’s like asking my dog to bone up on theology.
different-church-lady
@Corner Stone: Sounds like you’re threatening to cyber stalk him.
I’ll remember this months from now the next time you bring up that “threatened someone at work” incident.
Omnes Omnibus
@Charlieford: Dog is god spelled backwards.
different-church-lady
@Charlieford: I like you. Would you like a drink?
moops
If metadata includes approximate position of the phones then just give me IP packet traffic information and I could probably de-anonymize an ID tag for a large number of americans. Then with that I could probably backfill the rest of the set.
It only takes combining two or three sets of anonymous meta-data to reconstruct identities with pretty high precision, provided the coverage is wide enough and long enough.
The NSA is the largest CPU cycle user on earth. Since most good encryption schemes are not amenable to brute force attacks, what do you think those machines are doing ?
Omnes Omnibus
@moops: Playing a nice game of chess?
moops
http://strata.oreilly.com/2011/05/anonymize-data-limits.html
go have a read, and give up on the fantasy of anonymized data sets.
Corner Stone
@different-church-lady: you do that sweetie
Comrade Jake
LOL
http://www.guardian.co.uk/commentisfree/2013/jun/10/glenn-greenwald-readers-tell-us-nsa-files
Roy G.
@The Other Chuck: Ok, then Snowden may unreliable, but what does that say about DNI James Clapper, who was caught lying to Congress under testimony about this very thing? (A crime, I might add). Also, too, by your own logic, he’s unreliable, so who’s to say how much worse the situation really is?
Comrade Jake
@Narcissus: as one of my LANL colleagues is fond of saying “We apologize if these ppt slides insult your intelligence – they were prepared for management.”
fuckwit
Does it ever occur to you that the reason people are losing their shit over this is that they really don’t want anyone knowing what kind of porn they like?
I’ve been a sysadmin. I know what kind of porn EVERYONE likes. I think there have been estimates that some huge percentage, like 25% or more, of internet traffic is porn. Woudln’t surprise me in the least.
Hint: humans, after the age of puberty, are intensely, almost obsessively sexual creatures. We all love us some porn.
Only christians and maybe muslims seem to be unclear on this fact.
Maybe if this culture wasn’t so fucking puritanical we wouldn’t have people going apeshit over this stuff.
Corner Stone
@fuckwit:
I’m pretty sure this is not about what kind of pr0n people secretly like.
different-church-lady
@Corner Stone: That’s easy for a cyber-stalker to say.
Corner Stone
@different-church-lady: Your pr0n stash isn’t my cup of tea, actually. But, if you’re a dog lover then I guess you’re a dog lover!
different-church-lady
@Corner Stone:
I suppose I should shave it then.
Narcissus
@fuckwit: I’m into ALL KINDS of porn
Corner Stone
@different-church-lady:
Oh, yeah.
Shortstop
Kieran Healy. Seriously, it’s not that taxing to spell people’s names correctly once in a while. You just have to give a shit.
Charlieford
Appreciate the site, but I’ve seen ’em with better commenters. Just making an observation.
Keith G
@Charlieford: Got a short list?
Paul in KY
Cute writeup, but I think those who would be inclined to favor this will not have their opinion changed.
From the point of view of the British Empire, Mr. Revere was a criminal & traitor. The writeup seems to confirm that this data collection process can work as intended.
Agoraphobic Kleptomaniac
@Narcissus: We haven’t even found that the slides say what the media says they say.