Behind the most innovative hyperlocal technology companies is likely a Hadoop cluster — an open-source distributive data framework, which allows companies to collect and analyze data at scale. Hadoop was born out of a distributive computing framework, which Google developed in the early 2000’s, called MapReduce and has since become a standard in data management. Cloudera launched in 2008 to provide commercialized Hadoop-based database management software and services and has since raised over $70 million in venture funding from some of the biggest hitters in venture capital world.
Street Fight recently caught up with Cloudera’s VP of technology solutions Omer Trajman and solutions architect Patrick Angeles to discuss how “big data” is continuing to change the game in hyperlocal.
What’s the primary use-case for distributive computing in hyperlocal today?
Ultimately, it’s a recommendation game. If you’re a local vendor, the reason you go through one of these hyperlocal offer companies is because they presumably know more about potential customers than you do. What [hyperlocal companies] really bring to the table, and what differentiates them from say an amazon or anything that’s generic e-tail is this additional data [around location]. If I’ve got stock of N number of things, which I need to provide, and M number of customers, matching those two is a pretty hard problem. If I have a lot of detailed data about each of them, it because a lot easier problem to solve.
From a data perspective, is location problematic?
Location actually increases your sophistication and it’s the reason the drive to hyperlocal means you can do better recommendations. The challenge is that you’ve only got so much screen time [on mobile.] I can’t send you 30 coupons today, and have you browse through and pick the one you want, because chances are you wont pick any. If I send you one or two, I’m wasting valuable screen time plus my reputations as a recommender.
One of the big challenges you’re seeing today is data quality. For example, if someone checks in somewhere that’s a pretty good indication that they’re there, but if they just happen to have there phone on – and maybe the GPS is turned off – and the way the routing works, it looks like they’re downtown when they’re actually in Brooklyn, and you send them an offer that’s going to be a 40 minute subway ride away, how much did you lose in that opportunity? The data quality ends up being the biggest challenge once you introduce a massive volume of data.
How much of it is about the quality of the data in the first place?
I think we’ve seen pretty clearly over the past couple of years that better data beats better algorithms. So it’s not necessarily a race to be more sophisticated, it’s a race to get more data. There’s obviously the inventory race, in the sense of getting more users, but in the end its’ about how much data you have about them. The more data you can bring into a system, the more data you can run your algorithms through, and the more accurate your results will be.
Geo-location has been one of mobiles most powerful data outputs. What’s the next step?
Lat/Longs are just the beginning in terms of gathering data from mobile devices. As people move to a world that’s less web, and more app, interactive, the closed system is becoming increasingly important. If you listened to Yelp’s earnings report, one of the things they pointed out is that they’re getting a direct relationship with customers through mobile which, is cutting down on their dependence on search.
People don’t need to find a restaurant by going to Google and typing it in, they just load the Yelp app. Yelp knows where they are, what they’ve eaten in the past, and what they’ve rated, and now yelp can have a much more intimate experience with its consumers. It’s not just about location but all of the experiences that they are now having on their mobile device in a particular location, at a particular time, with certain people– that’s fascinating.
How important is it to have closed data environment?
It’s critical. It’s the reason android exists. This is why rumors of a Facebook phone surface every month. If you can have access to everything a consumer does, you can know them and have a real relationship with them. That’s what earns loyalty, that’s what drives purchases, that’s what these platforms are built to capture. But, they’re not all going to have that data.
What data sources do you see as having a big impact on the hyperlocal space over the next two years?
One of the things I’m keeping a close eye on, which hasn’t really taken off yet is augmented reality. I think we’re moving away from having to do explicit interactions to having our devices just know where we are, how we turn, what we’re looking at, and what we’re interested in.
People make fun of the Google glasses, but from a data perspective, consider how much information they can capture. The level of interaction, which you have on the web – when you browse over an item in amazon and add it to the cart – can be brought to the physical world. You can have the detail of web metrics, live in person.
A lot of startups talk about building a platform but Foursquare has seen by far the most success. Outside of the number of users, what makes their platform click?
The big difference between Foursquare and other hyperlocal companies is that they’re not thinking, “I’ll give you access to my data to build apps, but rather, more along the lines of what Facebook does, which is “run your apps over here and I’ll enrich them. I’ll give you a little bit of data to run your app and in exchange I know everything about your application, and everything the consumers using your application.”
So what’s the next big trend in big data computing and how will it affect the hyperlocal space?
The big trend we’re seeing is real-time. HBase, which sits on top of Hadoop, enables a millisecond access time so you can now start to think about incorporating data into your decision make process. You can run a recommendation engine that takes into account where you are right now, and bring it into the computations of which offer you’re going to serve to a person at a specific time. With this ability to apply these algorithms to massive amounts of data and do it in milliseconds, I think your going to see a massive shift in both the quality as well as variety and sophistication of the types of features your seeing in location. It’s no longer going to be stuff that’s pre-computed from a day or two ago; its going to be the stuff that happens five seconds ago.
Steven Jacobs is an associate editor at Street Fight.