A NodeJS Perspective on What’s New in MongoDB 2.6, Part I: Text Search

MongoDB shipped the newest stable version of its server, 2.6.0, this week. This new release is massive: there were about 4000 commits between 2.4 and 2.6. Unsurprisingly, the release notes are a pretty dense read and don’t quite convey how cool some of these new features are. To remedy that, I’ll dedicate a couple posts to putting on my NodeJS web developer hat and exploring interesting use cases for new features in 2.6. The first feature I’ll dig in to is text search, or, in layman’s terms, Google for your MongoDB documents.

Text search was technically in 2.4, but it was an experimental feature and not part of the query framework. Now, in 2.6, text is a full-fledged query operator, enabling you search for documents by text in 15 different languages.

Getting Started With Text Search

Let’s dive right in and use text search on the USDA SR-25 data set described in this post. You can download a mongorestore-friendly version of the data set here. The data set contains 8194 food items with associated nutrition data, and each food item has a human-readable description, e.g. “Kale, raw” or “Bison, ground, grass-fed, cooked”. Ideally, as a client of this data set, we shouldn’t have to remember whether we need to enter “Bison, grass-fed, ground, cooked” or “Bison, ground, grass-fed, cooked” to get the data we’re looking for. We should just be able to put in “grass-fed bison” and get reasonable results.

Thankfully, text search makes this simple. In order to do text search, first we need to create a text index on your copy of the USDA nutrition collection. Lets create one on the food item’s description:


db.nutrition.ensureIndex({ description : "text" });

Now, we can search the data set for our “raw kale” and “grass-fed bison”, and see what we get:


db.nutrition.find(
  { $text : { $search : "grass-fed bison" } },
  { description : 1 }).
    limit(3);

db.nutrition.find(
  { $text : { $search : "raw kale" } },
  { description : 1 }).
    limit(3);

 

Unfortunately, the results we got aren’t that useful, because they’re not in order of relevance. Unless we explicitly tell MongoDB to sort by the text score, we probably won’t get the most relevant documents first. Thankfully, with the help of the new $meta keyword (which is currently only useful for getting the text score), we can tell MongoDB to sort by text score as described here:

db.nutrition.find(
  { $text : { $search : "raw kale" } },
  { description : 1, textScore : { $meta : "textScore" } }).
    sort({ textScore : { $meta : "textScore" } }).
    limit(3);

Using Text Search in NodeJS

First, an important note on the compatibility of text search with NodeJS community projects: the MongoDB NodeJS driver is compatible with text search going back to at least 1.3.0. However, only the latest version of mquery, 0.6.0, is compatible with text search. By extension, the popular ODM Mongoose, which relies on mquery, unfortunately doesn’t have a text search compatible release at the time of this blog post. I pushed a commit to fix this and the next version of Mongoose, 3.8.9, should allow you to sort by text score. In summary, to use MongoDB text search, here are the version restrictions:

MongoDB NodeJS driver: >= 1.4.0 is recommended, but it seems to work going back to at least 1.2.0 in my personal experiments.

mquery: >= 0.6.0.

Mongoose: >= 3.8.9 (unfortunately not released yet as of 4/9/14)

Now that you know which versions are supported, let’s demonstrate how to actually do text search with the NodeJS driver. I created a simple food journal (e.g. an app that counts calories for you when you enter in how much of a certain food you’ve eaten) app that is meant to tie in to the SR-25 data set. This app is available on GitHub here, so feel free to play with it.

The LeanMEAN app exposes an API endpoint, GET /api/food/search/:search, that runs text search on a local copy of the SR-25 data set. The implementation of this endpoint is here. For convenience, here is the actual implementation, where the foodItem variable is a wrapper around the Node driver’s connection to the SR-25 collection.

/* Because MongooseJS doesn't quite support sorting by text search score
* just yet, just use the NodeJS driver directly */
exports.searchFood = function(foodItem) {
 return function(req, res) {
   var search = req.params.search;
   foodItem.connection().
     find(
       { $text : { $search : search } },
       { score : { $meta: "textScore" } }
     ).
     sort({ score: { $meta : "textScore" } }).
     limit(10).
     toArray(function(error, foodItems) {
       if (error) {
         res.json(500, { error : error });
       } else {
         res.json(foodItems);
       }
     });
 }
};

Unsurprisingly, this code looks pretty similar to the shell version, so it shouldn’t look unfamiliar to you NodeJS pros :)

Looking Forward

And that’s all on text search for now. In the next post (scheduled for 4/25), we’ll tackle some of the awesome new features in the aggregation framework, including text search in aggregation.

 

Plugging USDA Nutrition Data into MongoDB

As much as I love geeking out about basketball stats, I want to put a MongoDB data set out there that’s a bit more app-friendly: the USDA SR25 nutrient database. You can download this data set from my S3 bucket here, and plug it into your MongoDB instance using mongorestore. I’m very meticulous about nutrition and have, at times, kept a food journal, but sites like FitDay and DailyBurn have far too much spam and are far too poorly designed to be a viable option. With this data set, I plan on putting together an open source web-based food journal in the near future. However, I encourage you to use this data set to build your own apps.

Data Set Structure

The data set contains one collection, ‘nutrition’. The documents in this collection contain merged data from the SR25 database’s very relational FOOD_DES, NUTR_DEF, NUT_DATA, and WEIGHT files. In more comprehensible terms, the documents contain a description of a food item, a list of nutrients with measurements per 100g, and a list of common serving sizes for that food. Here’s what the top level document for grass-fed ground bison looks like in RoboMongo, a simple MongoDB GUI:

The top level document is fairly simple: the description is a human-readable description of the food, the manufacturer is the company that manufactures the product, and survey is whether or not the data set has values for the 65 nutrients used for some government survey. However, the real magic happens in the nutrients and weights subdocuments. Lets see what happens when we open up nutrients:

You’ll see that there are an incredible amount of nutrients. The nutrients data is in an array, where each subdocument in the array has a tagname, which is a common scientific abbreviation for the nutrient, a human-readable description, and an amountPer100G with corresponding units. In the above example, you’ll see that 100 grams of cooked grass-fed ground bison contains about 25.45 g of protein.

(Note: the original data set includes some more detailed data, including standard deviations and sample sizes for the nutrient measurements, but that’s outside the scope of what I want to do with this data set. If you want that data, feel free to read through the government data set’s documentation and fork my converter on github.)

Finally, the weights subdocument is another array which contains sub-documents that describe common serving sizes for the food item and their mass in grams. In the grass-fed ground bison example, the weights list contains a single serving size, 3 oz, which approximately 85 grams:

Exploring the Data Set

First things first: since the nutrients for each food are in an array, its not immediately obvious what nutrients this data set has. Thankfully, MongoDB’s distinct command makes this very easy:

There are a lot of different nutrients in this data set. In fact, there are 145:

So how are we going to find nutrient data for a food that we’re interested in? Suppose we’re looking to find how many carbs are in raw kale. Pretty easy to do because MongoDB’s shell supports JavaScript regular expressions, so lets just find documents where the description includes ‘kale’:

Of course, this doesn’t include the carbohydrate content, so lets add a $elemMatch to the projection to limit output to the carbohydrates in raw kale:

Running Aggregations to Test Nutritional Claims

My favorite burger joint in Chelsea, brgr, claims that grass-fed beef has as much omega-3 as salmon. Lets see if this advertising claim holds up to scrutiny:

Right now, this is a bit tricky. Since I imported the data from the USDA as-is, total omega-3 fatty acids is not tracked as a single nutrient. The amounts for individual omega-3 fatty acids, such as EPA and DHA, are recorded separately. However, the different types of omega-3 fatty acids all have n-3 in the description, so it should be pretty easy to identify which nutrients we need to sum up to get total omega-3 fatty acids. Of course, when you need to significantly transform your data, its time to bust out the MongoDB aggregation framework.

The first aggregation we’re going to do is find the salmon item that has the least amount of total omega-3 fatty acids per 100 grams. To do that, we first need to transform the documents to include the total amount of omega-3’s, rather than the individual omega-3 fats like EPA and DHA. With the $group pipeline state and the $sum operator, this is pretty simple. Keep in mind that the nutrient descriptions for omega-3 fatty acids are always in grams in this data set, so we don’t have to worry about unit conversions.


You can get a text version of the above aggregation on Github. To verify brgr’s claim, lets run the same aggregation for grass-fed ground beef, but reversing the sort order.


Looks like brgr’s claim doesn’t quite hold up to a cursory glance. I’d be curious to see what the basis for their claim is, specifically if they assume a smaller serving size for salmon than for grass-fed beef.

Conclusion

Phew, that was a lot of information to cram into one post. The data set, as provided by the USDA, is a bit complex and could really benefit from some simplification. Thankfully, MongoDB 2.6 is coming out soon, and, with it, the $out aggregation operator. The $out operator will enable you to pipe output from the aggregation framework to a separate collection, so I’ll hopefully be able to add total omega-3 fatty acids as a nutrient, among other things. Once again, feel free to download the data set here (or check out the converter repo on Github) and use it to build some awesome nutritional apps.

 

 

Why Math is Necessary for CS Majors

While math and computer science have been lumped together for about as long as the latter has existed, there’s a lot of backlash recently toward the idea that a solid math background is integral to being a good developer. The relationship between the two was something that I struggled to grasp as an undergraduate in Computer Science. The relationship between math and CS isn’t as direct as, say, math and physics, or even philosophy and CS. However, taking a rigorous pure math course as an undergraduate will help you significantly, whether you choose to be an ivory tower academic, a developer for the latest hip startup out of Silicon Valley, or an engineer for a big NYC bank.

The reason why has nothing to do with learning what most people would call “practical skills.” Even as an undergraduate specializing in theory and theoretical computer vision, I realized that the limit of my practical use of mathematics was a high school-level understanding of linear algebra, some basic graph theory, and whatever I needed for big-O notation. While some advanced mathematics, like Galois Theory, have CS-related applications, you probably won’t use them in CS outside of the most closeted of ivory towers. I can honestly say that, in the 8 years since I got my first software engineering internship back in high school, I’ve never had to use anything I learned in undergrad Real Analysis or Galois Theory (thankfully, because I honestly deserved to fail Galois Theory). So clearly, when I completed the required courses to graduate as a math major, I was wasting my time, right? Wrong!

The fallacy in the above reasoning is that learning CS isn’t a video game styled tech tree. Just because understanding a proof of Stokes’ Theorem isn’t a strict prerequisite for being an effective developer, doesn’t mean that it doesn’t help. As Irish poet W.B. Yeats once wisely said, “education is not the filling of a pail, but the lighting of a fire.” Similarly, learning to be a developer isn’t about crossing off a checklist of practical skills and making your resume look like a buzzword bingo board. Learning to be a developer is about practice (which is why Allen Iverson wasn’t a software developer), and what a pure math class gives you is a slightly different environment in which to practice your skills. When you have spent a little time looking at both, you’ll realize that going through a pure math textbook and remembering the correct theorems and lemmas to use in a homework proof is pretty damn similar to figuring out which modules you need to effectively add some new functionality to your codebase.

Many engineers bemoan the lack of unit testing instruction in undergrad CS curricula, they forget that unit tests are only useful if you have the rigor to write them in the first place. Not only is the process of figuring out which theorems to use an exercise in dependency management, but the process of proving a theorem is similar to writing unit tests to prove the correctness of your module. A lot of developers nowadays have copped a bad attitude when it comes to writing proper unit tests, saying that their code is trivially correct. I bet these people haven’t sat down to prove that there are no rational numbers satisfying the equation x^2 = 2 either, and I think that a solid grounding in pure math can nip this weakness in the bud. This way, Rudin’s Principles of Mathematical Analysis, the bane of every freshman math major’s existence, is essentially a large codebase for you to practice on.

Similarly, graph theory is absolutely integral to the day-to-day of being a software developer, even if you’re not working with graphs directly. Speaking of modules, to an experienced developer, a well-organized codebase looks a lot like a graph. Pieces of code are bits of logic with dependencies which are references to other bits of logic, which, of course, intuitively maps to nodes and edges. All refactoring work comes down to just thinking about graphs and how to make your code graph comprehensible. Beyond this simple example of refactoring and managing dependencies in code, the applications of reasoning about graphs in software development are endless, from breaking a bulky task down into components, to thinking about points of failure in a sophisticated network topology. While I’ve never had to think about Hadwinger’s conjecture in a professional context, my undergrad Graph Theory course gave me a lot of valuable practice reasoning about graphs in a rigorous way. This practice continues to serve me well to this day, whether I’m trying to organize my dependencies in AngularJS, thinking about the topology of my MongoDB cluster, or just figuring what tasks I need to get done today.

Bottom line, kids out there, if you really want to be a successful software developer, taking proof-based math (and proof-based graph theory in particular) is an excellent step in the right direction. It won’t be easy, but becoming good at something never is.

Pure-mathematics-formulæ-blackboard

The Optimal Setup for Listening to Talks at 2x Playback Speed

If you’re an avid podcast listener and online courseware consumer like I am, odds are you’ve gotten frustrated with how long it takes to listen to a single lecture. An hour-long podcast on Bulletproof Executive? 20 minutes listening to a TEDTalk from a HackDesign lesson? No offense to these awesome content creators, but ain’t nobody got time for that.

46486766.jpg

Thankfully, you can listen to Youtube videos and mp3’s at 2x speed pretty easily. While processing speech at twice the speed may seem intimidating, with a little preparation and a simple biohack, you can absorb information from 2x playback speed as well as you do at 1x.

Technical Details

So how do you actually take all these talks and listen to them at 2x playback speed? Currently, I rely on either finding the talk on Youtube or getting a downloadable mp3 version. Most podcasts I’ve seen link to downloadable mp3 versions, and most TED talks are pretty easy to find on Youtube, so I haven’t found this limitation to be significant.

To listen to Youtube videos at 2x, opt-in to using Youtube’s html5 player here. Obviously, you need an html5-enabled browser, but if you’re using a recent version of Chrome, Firefox, or Opera like a civilized human being, you should be fine. Once you’ve opted in, you should see the below cog icon on Youtube videos. As a first experiment, try watching Andrew Stanton’s excellent TEDTalk about storytelling at 2x.

youtube3.png

Listening to mp3’s at 2x is also extremely simple. My preferred approach uses VLC media player, but, if you’re willing to take the questionable risk of allowing Apple products on your computer, Quicktime player works just as well. In VLC’s top menu bar, Playback -> Speed -> Faster increases the current playback speed by 50%. Make sure you do this twice to get to our desired 2x playback speed.

vlc_faster2.png

Get Focused

One of the obvious difficulties inherent to listening to 2x playback speed audio is you miss more when you lose focus. You can get away with distractions when listening to talks with a lot of fluff, like the storytelling TEDTalk above, but if you lose focus for 30 seconds when listening to a Bulletproof Exec podcast because of a gchat notification, you’re going to be lost. When listening to 2x audio, you should channel the guy from My New Haircut and don’t let anything or anyone interrupt you while you’re in the zone. Here’s a couple tips to cut down on your distractions:

1) Exit out of all email tabs, IM clients, Facebook, and any other notification-generating apps. This includes putting your phone on silent.

2) Don’t actually watch the corresponding video. Unless somebody’s drawing a diagram, the visuals of the talk don’t contribute much to the actual content, and can be a source of distraction. Instead, point your browser to a very static and very boring page, like my personal favorite, this-page-intentionally-left-blank.org.

3) Binaural beats are a simple and powerful biohack that really help get your mind in the proper state for absorbing information. At a high level, binaural beats consist of two tones played at slightly different frequencies through your headphones. For example, one ear hears a 310 Hz tone, the other a 300 Hz tone, which helps entrain 10 Hz brain waves, i.e. alpha and mu waves. The theory is simple enough, so I recommend you head over to this Youtube channel and try it!

Personally, I usually start a 12 Hz binaural beat shortly before listening to a talk or podcast at 2x and keep it playing throughout. Not only do binaural beats help optimize your mental state, but they also provide a consistent baseline of sound to block out extraneous noise from your home, office, or crowded commuter train. Conventional wisdom around binaural beats usually says that a 8-10 Hz beat is optimal for learning new information, but 12 Hz works better in my own highly unquantified N=1 experiment.

Conclusion

I hope this information helps you get started in optimizing your information consumption. As a developer, I’m all about efficiency. And after starting this routine, I’ve been able to regularly digest my favorite online audio content in half the time, which has been a huge win.

Crunching 30 Years of NBA Data with MongoDB Aggregation

When you are looking to run analytics on large and complex data sets, you might instinctively reach for Hadoop. However, if your data’s in MongoDB, using the Hadoop connector seems like overkill if your data fits on your laptop. Luckily, MongoDB’s built-in aggregation framework offers a quick solution for running sophisticated analytics right from your MongoDB instance without needing any extra setup.

As a lifelong basketball fan, I often daydreamed about being able to run sophisticated analyses on NBA stats. So, when the MongoDB Driver Days Hackathon came around and Ruby driver lead Gary Murakami suggested putting together an interesting data set, we sat down and spent an afternoon building and running a scraper for basketball-reference.com. The resulting data set contains the final score and box scores for every NBA regular season game since the 1985-1986 season.

In the aggregation framework documentation, we often use a zip code data set to illustrate the uses of the aggregation framework. However, crunching numbers about the population of the United States doesn’t exactly captivate my imagination, and there are certain uses of the aggregation framework which the zip codes data set doesn’t highlight as well as it could.  Hopefully this data set will let you take a new look at the aggregation framework while you have some fun digging through NBA stats. You can download data set here and put it into your MongoDB instance using mongorestore.

Digging into the Data

First off, lets take a look at the structure of the data. There have been 31,686 NBA regular season games since 1985-86. Each individual document represents a game. Here is the high level metadata for the 1985-86 season opener between the Washington Bullets and the Atlanta Hawks, as represented in RoboMongo, a common MongoDB GUI:

 

The document contains a rich box score subdocument, a date field, and information on the teams that played. We can see that the Bullets won 100-91 as the road team. The box score data is similarly broken down by team in an array, with the winning team first. Note that the won flag is a member of the top level box score object, along with team and players.

 

The box score for each team is further broken down by team statistics and player statistics. The team stats above show the cumulative statistics for the Atlanta Hawks, showing that they shot 41-92 from the field and an atrocious 9-18 from the line. The players array shows the same statistics, but broken down for an individual player. For example, below you’ll see that the Hawks’ star Dominique Wilkins scored 32 points on 15-29 shooting and recorded 3 steals.

Running Some Aggregations

At a high level, the MongoDB aggregation framework is exposed as a shell function called aggregate, which takes in a list of aggregation pipeline stages. Each stage of the pipeline operates on the results of the preceding stage, and each stage can filter and transform the individual documents.

Before we start doing some serious number crunching, lets start out with a simple sanity check and compute which 5 teams had the most wins in the 1999-2000 season.  This can be achieved using a 6-stage pipeline:

1) Use the $match stage to limit ourselves to games that took place between August 1, 1999, and August 1, 2000, two dates that are sufficiently far removed from any NBA games to safely bound the season.

2) Use the $unwind stage to generate one document for each team in the game.

3) Use $match again to limit ourselves to teams that won.

4) Use the $group stage to count how many times a given team appears in the output of step 3.

5) Use the $sort stage to sort by number of wins, descending.

6) Use the $limit stage to limit ourselves to the 5 winningest teams.

The actual shell command is below. This command executes in essentially real-time on my laptop, even without any indices on the data, because there are only 31,686 documents in the collection.

db.games.aggregate([
  {
    $match : {
      date : {
        $gt : ISODate("1999-08-01T00:00:00Z"),
        $lt : ISODate("2000-08-01T00:00:00Z")
      }
    }
  },
  {
    $unwind : '$teams'
  },
  {
    $match : {
      'teams.won' : 1
    }
  },
  {
    $group : {
      _id : '$teams.name',
      wins : { $sum : 1 }
    }
  },
  {
    $sort : { wins : -1 }
  },
  {
    $limit : 5
  }
]);

We can expand on this simple example to answer the question of which team won the most games between the 2000-2001 season and the 2009-2010 season, by changing the $match step to limit ourselves to games that took place between August 1, 2000 and August 1, 2010. Turns out, the San Antonio Spurs won 579 games in that time period, narrowly beating the Dallas Mavericks’ 568.

db.games.aggregate([
  {
    $match : {
      date : {
        $gt : ISODate("2000-08-01T00:00:00Z"),
        $lt : ISODate("2010-08-01T00:00:00Z")
      }
    }
  },
  {
    $unwind : '$teams'
  },
  {
    $match : {
      'teams.won' : 1
    }
  },
  {
    $group : {
      _id : '$teams.name',
      wins : { $sum : 1 }
    }
  },
  {
    $sort : { wins : -1 }
  },
  {
    $limit : 5
  }
]);

Correlating Stats With Wins

Lets do something a bit more interesting using a couple of aggregation operators that you don’t often see when analyzing the zip codes data set: the $gte operator and the $cond operator in the $project stage. Lets use these to compute how often a team wins when they record more defensive rebounds than their opponent across the entire data set.

The tricky bit here is getting a notion of the difference between the winning team’s defensive rebounding total and the losing team’s defensive rebounding total. The aggregation framework makes computing the difference a bit tricky, but using $cond, we can transform the document so that the defensive rebounding total is negative if the team lost. We can then use $group to compute the defensive rebounding difference for each game. Lets walk through this step by step:

1) Use $unwind to get a document containing the box score for each team in the game.

2) Use $project with $cond to transform each document so the team’s defensive rebounding total is negative if the team lost, as defined by the won flag.

3) Use $group and $sum to add up the rebounding totals for each game. Since the previous stage made the losing team’s rebounding total negative, each document now has the difference between the winning team’s defensive rebounds and the losing team’s defensive rebounds.

4) Use $project and $gte to create a document which has a winningTeamHigher flag that is true if the winning team had more defensive rebounds than the losing team.

5) Use $group and $sum to compute for how many games winningTeamHigher was true.

db.games.aggregate([
  {
    $unwind : '$box'
  },
  {
    $project : {
      _id : '$_id',
      stat : {
        $cond : [
          { $gt : ['$box.won', 0] },
          '$box.team.drb',
          { $multiply : ['$box.team.drb', -1] }
        ]
      }
    }
  },
  {
    $group : {
      _id : '$_id',
      stat : { $sum : '$stat' }
    }
  },
  {
    $project : {
      _id : '$_id',
      winningTeamHigher : { $gte : ['$stat', 0] }
    }
  },
  {
    $group : {
      _id : '$winningTeamHigher',
      count : { $sum : 1 }
    }
  }
]);

The result turns out to be pretty interesting: the team which recorded more defensive rebounds won about 75% of the time. To put this in perspective, the team that recorded more field goals than the other team only wins 78.8% of the time! Try rewriting the above aggregation for other statistics, such as field goals, 3 pointers, turnovers, etc. You’ll find some rather interesting results. Offensive rebounds turn out to be a very bad predictor of which team won, as the team which recorded more offensive rebounds only won 51% of the time. 3 pointers turn out to be a very good predictor of which team won: the team which recorded more 3 pointers won about 64% of the time.

Defensive Rebounds and Total Rebounds Versus Win Percentage

Lets compute some data related to this that will be fun to graph. We’re going to compute what percentage of the time a team wins as a function of the number of defensive rebounds they recorded. This aggregation is pretty simple, all we need to do is $unwind the box score, and use $group to compute the average value of the won flag across each different defensive rebounding total.

db.games.aggregate([
  {
    $unwind : '$box'
  },
  {
    $group : {
      _id : '$box.team.drb',
      winPercentage : { $avg : '$box.won' }
    }
  },
  {
    $sort : { _id : 1 }
  }
]);

And when we graph out the results of this aggregation, we can create a nice graph which neatly shows a pretty solid correlation between defensive rebounds and win percentage. An interesting factoid: the team that recorded the fewest defensive rebounds in a win was the 1995-96 Toronto Raptors, who beat the Milwaukee Bucks 93-87 on 12/26/1995 despite recording only 14 defensive rebounds.

We can pretty easily modify the above aggregation to compute the same breakdown for total rebounds (TRB) versus defensive rebounds, and see if we get a different result.

db.games.aggregate([
  {
    $unwind : '$box'
  },
  {
    $group : {
      _id : '$box.team.trb',
      winPercentage : { $avg : '$box.won' }
    }
  },
  {
    $sort : { _id : 1 }
  }
]);

And in fact we do! After about 53 total rebounds, the positive correlation between total rebounds and win percentage vanishes completely! The correlation is definitely not as strong here as it was for defensive rebounds. As an aside, the Cleveland Cavaliers beat the New York Knicks 101-97 on April 11, 1996, despite recording only 21 total rebounds. Inversely, the San Antonio Spurs lost to the Houston Rockets, 112-110, on January 4, 1992 despite recording 75 total rebounds.

Conclusion

I hope this blog post has gotten you as excited about the aggregation framework as I am. Once again, you can download the data set here, and you’re very much encouraged to play with it yourself. I look forward to seeing what unique NBA analyses y’all will come up with.

Legal note: the attached data set is property of Sports Reference, LLC, and may only be used for education and evaluation under clause #1 of their terms of use. If you have not read or do not agree to Sports Reference, LLC’s terms of use, please do not download the data set.

What You Need To Know About AngularJS Data Binding

You hear a lot about data binding in AngularJS, and with good reason: its at the heart of everything you do with Angular. I’ve mentioned data binding more than a few times in my guides to directives and filters, but I haven’t quite explained the internals of how data binding works. To novices, it seems like straight sorcery, but, in reality, data binding is fundamentally very simple.

Scoping out the situation

Fundamentally, data binding consists of a set of functions associated with a scope. A scope is an execution context for the expressions you write in your HTML. AngularJS scopes behave like scopes in Javascript: a scope contains a set of named variables and is organized in a tree structure, so expressions in a given scope can access variables from an ancestor scope in the tree. However, data binding adds three powerful functions to a scope that enable you to assign an event handler to fire when a variable in scope changes as easily as you assign an event handler to fire when a button is clicked.

$watch()

This function takes an expression and a callback: the callback will be called when the value of the expression changes. For example, lets say our scope has a variable name, and we want to update the firstName and lastName variables every time name changes. With $watch, this is trivial:

$scope.$watch('name', function(value) {
  var firstSpace = (value || "").indexOf(' ');
  if (firstSpace == -1) {
    $scope.firstName = value;
    $scope.lastName = "";
  } else {
    $scope.firstName = value.substr(0, firstSpace);
    $scope.lastName = value.substr(firstSpace + 1);
  }
});

Under the hood, each scope has a list of watchers, internally called $scope.$$watchers, which contain the expression and the callback function. The $watch simply adds a new watcher to the $$watchers array, which AngularJS loops over when it thinks something that can change the state of the scope.

$apply()

When called without arguments, $apply lets AngularJS know that something happened that may have changed the state of the scope, so AngularJS knows to run through its watchers. You usually don’t have to call $apply() yourself, because directives like ngClick do it for you. However, if you’re writing your own event handler, like the swipeLeft and swipeRight directives from my guide to directives, you need to plug $apply() into your event handler. Try removing the $apply() calls from the swipeLeft and swipeRight directives in this JSFiddle and watch as the UI stops responding to swipes.

Some Important Information to $digest

$digest() is the third scope function related to data binding, and it’s the most important one. With high probability, you will never actually call $digest() directly, since $apply() does that for you. However, this function is at the core of all data binding magic, and its internals warrant some careful inspection if you’re going to be an AngularJS pro.

At a high level, $digest() runs through every watcher in the scope, evaluates the expression, and checks if the value of the expression has changed. If the value has changed, AngularJS calls the change callback with the new value and the old value. Simple, right? Well, not quite, there are a few subtleties.

1) The first subtlety is with the change callback: the change callback itself can change the scope, like we did with the name example in the $watch() section. If we had a watcher on firstName, this watcher wouldn’t fire! This is why $digest() is executed in a loop: $digest() will repeatedly execute all watchers until it goes through all watchers once without any of the watched expressions changing. In AngularJS internals, a dirty flag is set on each iteration of the $digest() loop when a change callback needs to be fired. If the dirty flag is not set after an iteration, $digest() terminates.

Of course, the $digest() loop described above can run forever, which is very bad. Internally, AngularJS uses a questionably-named field TTL (presumably “times to loop”) field to determine the maximum number of times a $digest() loop will run before giving up. By default, TTL is 10, and you will usually not run into this limit unless you have an infinite loop. If you for some reason need to tweak the TTL, the AngularJS root scope has a poorly-documented digestTtl() function which you can use to change the TTL on a per-page basis. You can read more about this function here.

2) The second subtlety is another interesting corner case with the change callback: what if the change callback calls $apply() or $digest()? Internally, AngularJS uses a system of phases to make sure this doesn’t happen: an error gets thrown if you try to enter the $digest phase while you’re already in the $digest phase.

3) Remember when we said that AngularJS scopes are organized in a tree structure and a scope can access its ancestor’s variables? Well, this means that $digest() needs to happen on every child scope in every iteration! Internally, this code is a bit messy in AngularJS, but each iteration of the $digest() loop does a depth-first search and performs the watcher check on every child scope. If any child scope is dirty, the loop has to run again!

4) Since Javascript is a single-threaded event-driven language, the $digest() loop cannot be interrupted. That means that the UI is blocked while $digest() is running, which means two very bad things happen when your $digest() is slow: your UI does not get updated until $digest() is done running, and your UI will not respond to user input, e.g. typing in an input field, until $digest() is done. To avoid a bad case of client side in-$digest-ion, make sure your event handlers lightweight and fast.

5) The final and often most overlooked subtlety is the question of what we mean when we say “checks if the value of the expression has changed”. Thankfully, AngularJS is a bit smarter than just using Javascript’s === operator: if AngularJS just used ===, data binding against an array would be very frustrating. Two arrays with the same elements could be considered different! Internally, AngularJS uses its own equals function. This function considers two objects to be equal if === says they’re equal, if angular.equals returns true for all their properties, if isNaN is true for both objects, or if the objects are both regular expressions and their string representations are equal. Long story short, this does the right thing in most situations:

angular.equals({ a : 1 }, { a : 1 }); //true
angular.equals({ a : 1 }, { a : 1, b : 2 }); // false

angular.equals([1], [1]); // true
angular.equals([1], [1, 2]); // false

angular.equals(parseInt("ABC", 10), parseInt("ABC", 10)); // true

Conclusion

Hopefully now you don’t need to go buy the recently released ng-book, which promises to teach you the answer to the question “seriously, what the does $apply and $digest mean?” Data binding seems like magic, but really its just some clever-yet-simple software engineering. You can even build your own version of AngularJS data binding if you follow the steps lined out in this awesome article. Ideally, now data binding should be a lot less confusing. Good luck and code on!

FiendishChain-TF05-JP-VG

My Top 5 Paleo Lifestyle Hacks for New Yorkers

It’s official: paleo was the most searched for health term on Google in 2013, and, thus, paleo is no longer weird. Well, maybe its still a little weird, but at least people don’t look at me like I’m crazy when I order a bunless burger anymore. As a matter of fact, I meet a lot of people who want to try going paleo, but they’re held back by aspects of the paleo lifestyle that seem beyond the pale to the average New York office worker.

First of all, here’s a short definition of what I mean by paleo lifestyle and nutrition:

1) Nutrition: no wheat, rice, corn, quinoa. No soy. Limit sugar. Limit carbohydrate intake to at most 75-100g / day. No oils other than avocado, coconut, and olive. Limit dairy except for butter and ghee as much as possible.

2) Lifestyle: get plenty of uninterrupted sleep, go barefoot or wear minimalist footwear

In theory, all these principles are manageable. But when you work in an office in New York, there are more than a few difficulties. Even as somebody who’s been Paleo since before moving to New York, I often struggle to avoid sugary cocktails, crappy delivery food, and dollar pizza spots. In that vein, here are my answers for the 5 most common excuses I’ve heard for why people can’t go paleo in New York.

“I want to wear minimalist shoes, but as great as Vibram FiveFingers are for the gym, I look ridiculous when I wear them with slacks and a shirt.”

While they’re not going to be mistaken for Prada or Crockett & Jones in terms of high fashion, VivoBarefoot has a few pairs of shoes that are pretty business casual friendly. Black RA Leathers and Gobis are somewhat oddly shaped but can pass for an inexpensive pair of Derbies. The RA Leathers are my go-to pair of shoes for an average day at the office. The Jay also looks pretty indistinguishable from a standard loafer. And for days when its well below freezing and the ground is covered in snow, like today, the Synth Hiker does a pretty good job keeping your feet dry and not being too gaudy.

“Wait a minute, what am I supposed to order off of Seamless if I can’t order a sandwich or cheap chinese food?”

I’m not gonna lie, if you’re looking for high quality paleo-friendly food in New York, the pickings are pretty slim on Seamless. The standard approach is “order a salad”. Some places also have salads that are pretty boss. The key is to get one with a bunch of meat or fish, if not other things. From the right places, you can basically consider it an order of meat with some veggies on the side instead of “just a salad.”

Alternatives depend on your area, you’ve got some good options in midtown, but if you work around Soho or Chinatown you’re pretty much SOL. Here are a few ideas that I’ve utilized effectively:

1) Sashimi lunch specials

2) Know the 3 roll lunch specials that every sushi place has? Order that, but ask for hand rolls with no rice. Not every place will accommodate you, so you need to experiment and figure out which sushi places work. Trust me, the simple deliciousness of salmon and avocado wrapped in nori without the extra rice is well worth the effort.

3) Bareburger. The simultaneously best and most underrated burger in New York, and they serve grass-fed beef, in addition to amazing lamb, elk, and wild boar burgers. When you’re craving a truly exceptional burger, skip the Shake Shack (not that any self-respecting New Yorker would get caught in that tourist trap) and get Bareburger delivered, you’ll be much healthier and much happier for it.

4) Dig Inn is another Midtown favorite of mine. While their beef is not grass-finished, at least their beef is partially grass fed and their salmon is wild, which is more than I can say for 99% of the places I’ve seen on Seamless.

“If I can’t drink beer, mimosas, bellinis, gin and tonics, or rum and cokes, what can I actually drink at a bar?”

Short answer: prefer straight liquor or something very close, but a nice dry red wine works reasonably well too. If you’re a guy, this’ll give you an excuse to learn to appreciate good single malt scotch like a real man. But if you really want your alcohol watered down, plain soda water with lemon or lime is a decent substitute for Sprite or whatever your sparkling sugar bomb of choice is.

If you’re at a cocktail bar, a Martini, Manhattan, Rob Roy, Vesper, or other combination of liquor, vermouth, and/or bitters works pretty well as a substitute for sugar-laden cocktails. For example, PJ Clarke’s is a favorite cocktail bar among tourists and Met Opera attendees, and their Perfect Manhattan is a reasonably good low-sugar cocktail.

If you must go for boozy brunch, skip the orange juice and go for straight bubbly. Sparkling wine is not exactly the healthiest thing in the world, but the real offender in a mimosa is the orange juice. A glass of Korbel Brut only has about 10g carbohydrate, but a glass of orange juice has about 30g of pure sugar and contributes nothing to the social relaxation effects of alcohol. Cider’s also a good alternative: a 12oz bottle of Magner’s Hard Cider has no gluten, has only 10g of carbohydrate, and tastes damn good to boot.

“If I can’t drink cappucinos, lattes, macchiatos, or any of Starbucks’ other sugar bombs, how do I get my caffeine fix in the morning?”

A surprising number of people don’t like their coffee black for some reason. Well, with bulletproof coffee, there’s no excuse. Its sweet, its fatty, its extremely good for you, and it’s freakin’ delicious. You even end up saving money on every cup because even if you go with the bulletproof coffee k-cups (which I do), they’re still a solid 50 cents cheaper than a tall medium roast from Starbucks. Add to that not having to stand in line with a bunch of people who are very cross from sleep-deprivation and you have a net win.

“What do you actually do at the gym if not spend hour after hour on a treadmill?”

The short answer, courtesy of Marks Daily Apple, is “sprint and lift heavy things”. Last year I worked primarily on improving my bench press 1 rep max and my 1 set max for pullups, which I’ll write a blog post about some other time. This year, I’m working on finally achieving my childhood dream of being able to dunk. My usual trip to the gym nowadays involves a light warmup with some stretching, jogging, and pushups, and then 2-4 exercises, usually involving bench press, pullups, squats, or kettlebell swings. Once a week I throw a sprint in there, either on my apartment building’s dog walk or at the gym on a stationary bike.

Hopefully now you have a better idea of how to rock a paleo lifestyle in NYC. I think you’ll find that it’s like riding a bike: after a little bit of practice it becomes second nature.

IMG_5016