How the U.S. Uses Technology to Mine More Data More Quickly

By and

At the National Security Agency in Fort Meade, Md. Disclosures have provided a rare glimpse into the agency’s growing reach.


WASHINGTON — When American analysts hunting terrorists sought new ways to comb through the troves of phone records, e-mails and other data piling up as digital communications exploded over the past decade, they turned to Silicon Valley computer experts who had developed complex equations to thwart Russian mobsters intent on credit card fraud.

The partnership between the intelligence community and Palantir Technologies, a Palo Alto, Calif., company founded by a group of inventors from PayPal, is just one of many that the National Security Agency and other agencies have forged as they have rushed to unlock the secrets of “Big Data.”

Today, a revolution in software technology that allows for the highly automated and instantaneous analysis of enormous volumes of digital information has transformed the N.S.A., turning it into the virtual landlord of the digital assets of Americans and foreigners alike. The new technology has, for the first time, given America’s spies the ability to track the activities and movements of people almost anywhere in the world without actually watching them or listening to their conversations.

New disclosures that the N.S.A. has secretly acquired the phone records of millions of Americans and access to e-mails, videos and other data of foreigners from nine United States Internet companies have provided a rare glimpse into the growing reach of the nation’s largest spy agency. They have also alarmed the government: on Saturday night, Shawn Turner, a spokesman for the director of national intelligence, said that “a crimes report has been filed by the N.S.A.”

With little public debate, the N.S.A. has been undergoing rapid expansion in order to exploit the mountains of new data being created each day. The government has poured billions of dollars into the agency over the last decade, building a one-million-square-foot fortress in the mountains of Utah, apparently to store huge volumes of personal data indefinitely. It created intercept stations across the country, according to former industry and intelligence officials, and helped build one of the world’s fastest computers to crack the codes that protect information.

While once the flow of data across the Internet appeared too overwhelming for N.S.A. to keep up with, the recent revelations suggest that the agency’s capabilities are now far greater than most outsiders believed. “Five years ago, I would have said they don’t have the capability to monitor a significant amount of Internet traffic,” said Herbert S. Lin, an expert in computer science and telecommunications at the National Research Council. Now, he said, it appears “that they are getting close to that goal.”

On Saturday, it became clear how close: Another N.S.A. document, again cited by The Guardian, showed a “global heat map” that appeared to represent how much data the N.S.A. sweeps up around the world. It showed that in March 2013 there were 97 billion pieces of data collected from networks worldwide; about 14 percent of it was in Iran, much was from Pakistan and about 3 percent came from inside the United States, though some of that might have been foreign data traffic routed through American-based servers.

A Shift in Focus

The agency’s ability to efficiently mine metadata, data about who is calling or e-mailing, has made wiretapping and eavesdropping on communications far less vital, according to data experts. That access to data from companies that Americans depend on daily raises troubling questions about privacy and civil liberties that officials in Washington, insistent on near-total secrecy, have yet to address.

“American laws and American policy view the content of communications as the most private and the most valuable, but that is backwards today,” said Marc Rotenberg, the executive director of the Electronic Privacy Information Center, a Washington group. “The information associated with communications today is often more significant than the communications itself, and the people who do the data mining know that.”

In the 1960s, when the N.S.A. successfully intercepted the primitive car phones used by Soviet leaders driving around Moscow in their Zil limousines, there was no chance the agency would accidentally pick up Americans. Today, if it is scanning for a foreign politician’s Gmail account or hunting for the cellphone number of someone suspected of being a terrorist, the possibilities for what N.S.A. calls “incidental” collection of Americans are far greater.

United States laws restrict wiretapping and eavesdropping on the actual content of the communications of American citizens but offer very little protection to the digital data thrown off by the telephone when a call is made. And they offer virtually no protection to other forms of non-telephone-related data like credit card transactions.

Because of smartphones, tablets, social media sites, e-mail and other forms of digital communications, the world creates 2.5 quintillion bytes of new data daily, according to I.B.M.

The company estimates that 90 percent of the data that now exists in the world has been created in just the last two years. From now until 2020, the digital universe is expected to double every two years, according to a study by the International Data Corporation.

Accompanying that explosive growth has been rapid progress in the ability to sift through the information.

When separate streams of data are integrated into large databases — matching, for example, time and location data from cellphones with credit card purchases or E-ZPass use — intelligence analysts are given a mosaic of a person’s life that would never be available from simply listening to their conversations. Just four data points about the location and time of a mobile phone call, a study published in Nature found, make it possible to identify the caller 95 percent of the time.

“We can find all sorts of correlations and patterns,” said one government computer scientist who spoke on condition of anonymity because he was not authorized to comment publicly. “There have been tremendous advances.”

Secret Programs

When President George W. Bush secretly began the N.S.A.’s warrantless wiretapping program in October 2001, to listen in on the international telephone calls and e-mails of American citizens without court approval, the program was accompanied by large-scale data mining operations.

Those secret programs prompted a showdown in March 2004 between Bush White House officials and a group of top Justice Department and F.B.I. officials in the hospital room of John Ashcroft, then the attorney general. Justice Department lawyers who were willing to go along with warrantless wiretapping argued that the data mining raised greater constitutional concerns.

In 2003, after a Pentagon plan to create a data-mining operation known as the Total Information Awareness program was disclosed, a firestorm of protest forced the Bush administration to back off.

But since then, the intelligence community’s data-mining operations have grown enormously, according to industry and intelligence experts.

The confrontation in Mr. Ashcroft’s hospital room took place just one month after a Harvard undergraduate, Mark Zuckerberg, created Facebook; Twitter would not be founded for two more years. Apple’s iPhone and iPad did not yet exist.

“More and more services like Google and Facebook have become huge central repositories for information,” observed Dan Auerbach, a technology analyst with the Electronic Frontier Foundation. “That’s created a pile of data that is an incredibly attractive target for law enforcement and intelligence agencies.”

The spy agencies have long been among the most demanding customers for advanced computing and data-mining software — and even more so in recent years, according to industry analysts. “They tell you that somewhere there is an American who is going to be blown up,” said a former technology executive, and “the only thing that stands between that and him living is you.”

In 2006, the Bush administration established a program known as the Intelligence Advanced Research Projects Activity, to accelerate the development of intelligence-related technology intended “to provide the United States with an overwhelming intelligence advantage over future adversaries.”

I.B.M.’s Watson, the supercomputing technology that defeated human Jeopardy! champions in 2011, is a prime example of the power of data-intensive artificial intelligence.

Watson-style computing, analysts said, is precisely the technology that would make the ambitious data-collection program of the N.S.A. seem practical. Computers could instantly sift through the mass of Internet communications data, see patterns of suspicious online behavior and thus narrow the hunt for terrorists.

Both the N.S.A. and the Central Intelligence Agency have been testing Watson in the last two years, said a consultant who has advised the government and asked not to be identified because he was not authorized to speak.


Industry experts say that intelligence and law enforcement agencies also use a new technology, known as trilaterization, that allows tracking of an individual’s location, moment to moment. The data, obtained from cellphone towers, can track the altitude of a person, down to the specific floor in a building. There is even software that exploits the cellphone data seeking to predict a person’s most likely route. “It is extreme Big Brother,” said Alex Fielding, an expert in networking and data centers.

In addition to opening the Utah data center, reportedly scheduled for this year, N.S.A. has secretly enlarged its footprint inside the United States, according to accounts from whistle-blowers in recent years.

In Virginia, a telecommunications consultant reported, Verizon had set up a dedicated fiber-optic line running from New Jersey to Quantico, Va., home to a large military base, allowing government officials to gain access to all communications flowing through the carrier’s operations center.

In Georgia, an N.S.A. official said in interviews, the agency had combed through huge volumes of routine e-mails to and from Americans.

And in San Francisco, a technician at AT& T reported on the existence of a secret room there reserved for the N.S.A. that allowed the spy agency to copy and store millions of domestic and international phone calls routed through that station.

Nothing revealed in recent days suggests that N.S.A. eavesdroppers have violated the law by targeting ordinary Americans. On Friday, President Obama defended the agency’s collection of phone records and other metadata, saying it did not involve listening to conversations or reading the content of e-mails. “Some of the hype we’ve been hearing over the past day or so — nobody has listened to the content of people’s phone calls,” he said.

Mr. Rotenberg, referring to the constitutional limits on search and seizure, said, “It is a bit of a fantasy to think that the government can seize so much information without implicating the Fourth Amendment interests of American citizens.”


Reporting was contributed by David E. Sanger and Scott Shane from Washington, Steve Lohr and James Glanz from New York, and Quentin Hardy from Berkeley, Calif.

4 Responses to How the U.S. Uses Technology to Mine More Data More Quickly

  1. KSC 10/06/2013 at 07:03 #

    ”Något Stasi bara kunnat drömma om”

    Den amerikanska övervakningsskandalen Prism berör alla svenskar som använder Google och Facebook. Men frågan är om upprördheten den här gången blir lika stor som under FRA-debatten. Kanske har vi inte högre förväntningar än så här på vare sig sociala medier eller demokratiska stater.

    The Washington Posts och The Guardians avslöjande att amerikanska nationella säkerhetsmyndigheten NSA har direkttillgång till de ledande it-företagens servrar har fått enorm massmedial uppmärksamhet.

    Här på redaktionen i Stockholm kunde det, inledningsvis under fredagsmorgonen, konstateras att läsartrycket på artiklarna i ämnet inte var så stort man hade kunnat förvänta sig – givet att ämnet, internet och integritet, ofta är mycket väl läst på

    Under dagen, i takt med att fler röster har lyfts fram om avslöjandet, så har också läsarnas intresse vaknat.

    Alltför stora växlar kan inte dras på svängningar i läsarstatistiken. Men inte heller i de sociala medierna har reaktionerna varit riktigt så kraftfulla som man kunde ha väntat sig. Klämdag och sommarsol kan förvisso ha en menlig inverkan på twittrande och artikeldelande.

    Men frågan kan ändå ställas: Hur ser allmänhetens inställning egentligen ut? Kanske har vi helt enkelt inte högre förväntningar på nättjänster som Google och sociala medier som Facebook än så här. Eller på den amerikanska staten.

    En alternativ tolkning skulle kunna vara att vi sätter upp skygglapparna – vi vill helst inte påminnas om hur lätt det är att övervaka oss på nätet.

    Många har också redan gjort sig lustiga på Twitter och bloggar över ”avslöjandet” att amerikanska staten övervakar oss på nätet – det ses lika sjävklart som att vatten är vått.

    Just nu håller EU på att ta fram en ny omfattande dataskyddslagstiftning som bättre ska skydda just vår integritet på nätet, under stora protester från exakt de amerikanska it-jättar som nu pekas ut i Prism-skandalen. Men den diskussionen vill inte riktigt ta sig bland medborgarna. Efter datalagringsdirektiv, FRA, Acta, Sopa och andra bokstavsförkortningar, är vi för trötta för ännu en integritetsdebatt?

    Det amerikanska Prism-avslöjandet har annars uppenbara paralleller till den svenska FRA-debatten för fem år sedan. Den lagändringen, som bland annat innebar att Försvarets radioanstalt fick rätt att bedriva signalspaning på internettrafik som passerar Sveriges gränser, ledde till stora protester och bloggbävning.

    Nationella säkerhetsmyndigheten NSA är USA:s motsvarighet till FRA. På samma sätt som FRA inte får spana på inrikestrafik i Sverige har amerikanska regeringen nu gått ut och sagt att den avslöjade datainsamlingen sker från personer utanför USA, inte från amerikanska medborgare. Skönt att höra för amerikanska medborgare kanske, mindre lugnande för alla oss andra.

    Skillnader mellan NSA:s och FRA:s befogenheter finns dock. Amerikanska NSA förefaller ha färre integritetsbestämmelser att ta hänsyn till, och har enligt avslöjandet tillgång till all lagrad data medan svenska FRA bara får spana på trafik i realtid.

    USA har generellt en slappare lagstiftning kring persondata än Sverige och EU, och inte heller samma integritetssyn. Uppmärksamheten kring Prism i amerikanska medier visar dock att gränsen den här gången passerats med råge.

    På senare år har många, undertecknad inkluderad, höjt ett varnande finger för att vi alldeles för lättvindigt – och helt gratis – ger ifrån oss mängder av data om oss själva till de amerikanska it-jättarna. Uppgifter som de tjänar stora pengar på genom att erbjuda andra företag träffsäkrare reklam. Om du använder en tjänst som är gratis är det du som är varan, har det ofta sagts.

    Och vi har ingen aning om vem Facebook, Apple och Google kommer att sälja vår data till om säg tio år – kanske till försäkringsbolag eller rekryteringsfirmor? Eller vad som händer med våra personuppgifter hos Facebook när de konkurreras ut av en annan aktör.

    Lever jag då själv som jag lär? Absolut inte. Pusslet som går att lägga om mitt liv bara genom tillgång till Facebooks och Googles servrar – mina intressen, åsikter, musiksmak, favoritfilmer, vänner, familj, resor, platser jag rör mig på vid exakt angivna tidpunkter och så vidare – är något Stasi bara kunnat drömma om.

    Att de utpekade företagen så långt blånekar till pressuppgifterna höjer inte direkt trovärdigheten för dem.

    Men en relevant fråga är hur mycket de, bakom kulisserna, kämpat mot NSA. Att Prism-projektet pågått i sex år och att Google togs med först efter två år och Apple så sent som förra året, tyder på ett motstånd från företagen. Som flera noterat finns inte Twitter med alls på listan som läckt ut.

    Att amerikanska staten har direkt insyn i Facebooks, Googles och de andras databaser är i sig inte överraskande. Det verkligt intressanta vore att få veta vilken typ av påtryckningar den amerikanska staten ägnat sig åt för att få den tillgången.

    Men den briserade skandalen visar att det egentligen är ointressant att försöka peka ut vad som är värst: att it-företagens kartlägger våra privatliv eller att staten gör det. Faktum är att det görs. Och att data läcker mellan företag och myndigheter. Att blunda för det kan få ödesdigra personliga konsekvenser – i dag, eller om tio år.

Leave a Reply

Optimization WordPress Plugins & Solutions by W3 EDGE