Visualization of the Week: Mapping Mexico’s drug war

Diego Valle-Jones has created a powerful interactive map of the ongoing drug war in Mexico.

The interactive map lets you compare homicides and drug-related homicides, with the option to examine marijuana, opium, and drug-lab-related homicides. If you click on a bubble, you can see the number of murders over time, dating back to 2004. Important events are highlighted on that time line. You can also draw a shape on the map to look at a particular region.

Map of the drug war in Mexico
Click to see the full interactive version of “Map of the Drug War in Mexico.”


Valle-Jones writes:

“To unclutter the map and following the lead of the paper Trafficking Networks and the Mexican Drug War by Melissa Dell, I decided to only show the optimal highways (according to my own data and Google Directions) to reach the US border ports from the municipalities with the highest drug plant eradication between 1994 and 2003 and the highest 2d density estimate of drug labs based on newspaper reports of seizures. The map is a work in progress and is still missing the cocaine routes, but hopefully I’ll be able to add them shortly.”

The data can be exported to CSV, and the source code is available on Github.

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We’re always looking for leads, so please drop a line if there’s a visualization you think we should know about.

Strata 2012— The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.Save 20% on registration with the code RADAR20

More Visualizations:

Strata Week: The Megaupload seizure and user data

Here are a few of the data stories that caught my attention this week.

Megaupload’s seizure and questions about controlling user data

When the file-storage and sharing site Megaupload had its domain name seized, assets frozen and website shut down in mid-January, the U.S. Justice Department contended that the owners were operating a site dedicated to copyright infringement. But that posed a huge problem for those who were using Megaupload for the legitimate and legal storage of their files. As the EFF noted, these users weren’t given any notice of the seizure, nor were they given an opportunity to retrieve their data.

Moreover, it seemed this week that those users would have all their data deleted, as Megaupload would no longer be able to pay its server fees.

While it appears that users have won a two-week reprieve before any deletion actually occurs, the incident does raise a number of questions about users’ data rights and control in the cloud. Specifically: What happens to user data when a file hosting / cloud provider goes under? And how much time and notice should users have to reclaim their data?

Megaupload seizure notice
This is what you see when you visit

Bloomberg opens its market data distribution technology

The financial news and information company Bloomberg opened its market data distribution interface this week. The BLPAPI is available under a free-use license at According to the press release, some 100,000 people already use the BLPAPI, but with this week’s announcement, the interface will be more broadly available.

The company introduced its Bloomberg Open Symbology back in 2009, a move to provide an alternative to some of the proprietary systems for identifying securities (particularly those services offered by Bloomberg’s competitor Thomson Reuters). This week’s opening of the BLPAPI is a similar gesture, one that the company says is part of its “Open Market Data Initiative, an ongoing effort to embrace and promote open solutions for the financial services industry.”

The BLPAPI works with a range of programming languages, including Java, C, C++, .NET, COM and Perl. But while the interface itself is free to use, the content is not.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Pentaho moves Kettle to the Apache 2.0 license

Pentaho’s extract-transform-load technology Pentaho Kettle is being moved to the Apache License, Version 2.0. Kettle was previously available under the GNU Lesser General Public License (LGPL).

By moving to the Apache license, Pentaho says it will be more in line with the licensing of Hadoop, Hbase, and a number of NoSQL projects.

Kettle downloads and documentation are available at the Pentaho Big Data Community Home.

Oscar screeners and movie piracy data

Andy Baio took a look at some of the data surrounding piracy and the Oscar screening process. There has long been concern that the review copies of movies distributed to members of the Academy of Motion Arts and Sciences were making their way online. Baio observed that while a record number of films have been nominated for Oscars this year (37), just eight of the “screeners” have been leaked online, “a record low that continues the downward trend from last year.”

However, while the number of screeners available online has diminished, almost all of the nominated films (34) had already been leaked online. “If the goal of blocking leaks is to keep the films off the Internet, then the MPAA [Motion Picture Association of America] still has a long way to go,” Baio wrote.

Baio has a number of additional observations about these leaks (and he also made the full data dump available for others to examine). But as the MPAA and others are making arguments (and helping pen related legislation) to crack down on Internet privacy, a good look at piracy trends seems particularly important.

Got data news?

Feel free to email me.


Why Hadoop caught on

Doug Cutting (@cutting) is a founder of the Apache Hadoop project and an architect at Hadoop provider Cloudera. When Cutting expresses surprise at Hadoop’s growth — as he does below — that carries a lot of weight.

In the following interview, Cutting explains why he’s surprised at Hadoop’s ascendance, and he looks at the factors that helped Hadoop catch on. He’ll expand on some of these points during his Hadoop session at the upcoming Strata Conference.

Why do you think Hadoop has caught on?

Doug CuttingDoug Cutting: Hadoop is a technology whose time had come. As computer use has spread, institutions are generating vastly more data. While commodity hardware offers affordable raw storage and compute horsepower, before Hadoop, there was no commodity software to harness it. Without tools, useful data was simply discarded.

Open source is a methodology for commoditizing software. Google published its technological solutions, and the Hadoop community at Apache brought these to the rest of the world. Commodity hardware combined with the latent demand for data analysis formed the fuel that Hadoop ignited.

Are you surprised at its growth?

Doug Cutting: Yes. I didn’t expect Hadoop to become such a central component of data processing. I recognized that Google’s techniques would be useful to other search engines and that open source was the best way to spread these techniques. But I did not realize how many other folks had big data problems nor how many of these Hadoop applied to.

What role do you see Hadoop playing in the near-term future of data science and big data?

Doug Cutting: Hadoop is a central technology of big data and data science. HDFS is where folks store most of their data, and MapReduce is how they execute most of their analysis. There are some storage alternatives — for example, Cassandra and CouchDB, and useful computing alternatives, like S4, Giraph, etc. — but I don’t see any of these replacing HDFS or MapReduce soon as the primary tools for big data.

Long term, we’ll see. The ecosystem at Apache is a loosely-coupled set of separate projects. New components are regularly added to augment or replace incumbents. Such an ecosystem can survive the obsolescence of even its most central components.

In your Strata session description, you note that “Apache Hadoop forms the kernel of an operating system for big data.” What else is in that operating system? How is that OS being put to use?

Doug Cutting: Operating systems permit folks to share resources, managing permissions and allocations. The two primary resources are storage and computation. Hadoop provides scalable storage through HDFS and scalable computation through MapReduce. It supports authorization, authentication, permissions, quotas and other operating system features. So, narrowly speaking, Hadoop alone is an operating system.

But no one uses Hadoop alone. Rather, folks also use HBase, Hive, Pig, Flume, Sqoop and many other ecosystem components. So, just as folks refer to more than the Linux kernel when they say “Linux,” folks often refer to the entire Hadoop ecosystem when they say “Hadoop.” Apache BigTop combines many of these ecosystem projects together into a distribution, much like RHL and Ubuntu do for Linux.

Strata 2012— The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.Save 20% on registration with the code RADAR20