Who’s A Mad Scientist?

Thanks to the New York Times, folks in the film industry are once again worrying about the invasion of the data crunchers.

In the recent article Solving Equation of a Hit Film Script, With Data, reporter Brooks Barnes practically compares us data crunchers to a zombie invasion. And yes, I said “us.” I am a data cruncher (though oddly enough, we don’t much use that term in the business).  So I do have a wee bit of an invested stake in this topic.

Which is why I feel the need to set the record straight on a few points.  So let’s immediately dispense with the idea that we are somehow the “mad scientists” of the film industry.  I have heard this term used by various people in film.  I suspect it gives them a brief sense of bogus empowerment to reduce us to the weird level of a character played by John Carradine.  In reality, we are closer in type to Harper Reed.  Well, minus the glasses and the strange resemblance to Conan O’Brien.

The only “mad” thing about what we do is that we attempt to use logic, reason, and rational thinking as a means of charting patterns and projections through the otherwise confusing elements inherent to the film production process.  Like what are we supposed to use, goat entrails and Ouija boards?  Trust me, that’s been tried and it doesn’t work.

Hollywood and the world of the data crunchers are on a convergence course. The reason is pretty simple: because it works. In reality, it works in many other fields as we are reminded by the blog post Is Big Data the Next Frontier for Innovation in the Arts?   Contrary to the long standing industry folklore, there are many elements of the process that can be quantified and studied. Of course, there are the usual responses:

  1. Arthur De Vany proved that it couldn’t be done. In his book Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry, De Vany argued that there were too many unpredictable elements in the process to predict a successful outcome. Actually, De Vany presents a notion of film investment that inadvertently should dissuade any sane person from ever going near film investments. However, his focus was on the issue of being able to predict a major successful outcome for a movie. This is a very narrow (and I would even argue misguided) focus.  We are more concerned with preventing the film from falling flat on its face. Likewise, De Vany was dealing with an old business model that is no longer relevant to the industry. Personally, I think it is time to put De Vany to bed.
  2. William Goldman said it all when he stated about movie production: “Nobody knows anything.” It’s a great line. Very snappy. The same is true of my neighbor’s dog. What Goldman does remind us is that movies, like everything else in life, is full of uncertainties. That is true even in certain areas of mathematics. So?  Seriously. Goldman’s comment is not a rebuttal. It is primarily an expression of exasperation. Nothing more, nothing less.

No one involved in this pursuit is promising miracles. What we do promise is a manageable concept with definable results. Granted, a lot of people in the film industry are not that familiar with the basic concepts and I would strongly recommend a bit of study into the general areas of statistical analysis. Sometimes, the best presentations for the general reader (or viewer) is found in some of the more satirical presentations (and it should be noted that critical presentations is one of the ways we learn stuff in this field – that is why some of the most scathing critiques of statistical analysis is produced by people practicing the trade).

Darrell Huff’s classic book “How to Lie With Statistics is still a must-read, especially for the general audience.  Likewise, Sebastian Wernicke’s video presentation called Lies, Damned Lies and Statistics (About TEDTalks) is a hoot of a backhanded guide to the process (which, by the way, is his field).  You can even start acquiring a rudimentary education at eHow.com. It’s all pretty basic stuff, but the basics are how you learn.

Granted, you will discover some strange things. In the NYT article, Vinny Bruzzese of Worldwide Motion Picture Group points out that “bowling scenes” have a negative effect on a film’s box office outcome.  This has led to a lot of giggles among the naysayers. Actually, Bruzzese is right. Nobody can actually explain the why, but it is one of about three really weird, yet overwhelmingly negative recurring patterns that can be charted through such analysis. I can tell you right now that a PG-rated horror screenplay about zombie bowlers who read Ayn Rand is a sure bet for box office failure.

The bowling thing is extremely straight forward. Then, Bruzzese goes on about how movies with Guardian Superheroes always perform better than film’s with Cursed Superheroes. Again, he is right but there is a catch. The vast majority of these movies would fit the type of Guardian Superheroes.  Only a few (roughly 3 to 5 titles depending upon the way he is defining the types) would fit as Cursed Superheroes. In the case of at least 3 of these films, other issues would be sufficient to explain their failures. Likewise, the imbalance in the data is so great that I don’t feel a reasonable analysis can be achieved. At the very least, I would have to question the approach on this particular set of figures, which is OK.

What a lot of people fail to understand is that this is a process. The results produced through this type of process are not meant to be words written in stone. Instead, they create a framework from which a more thorough analysis is made possible.


It isn’t magic. It’s science. Heck, you are even allowed (and encouraged) to ask questions.