Managing the Data Infrastructure Behind Hyperlocal
Behind every location-based app is some wonky, sophisticated stream of data linking information about business, consumers, geography and many more categories relevant to each app. Factual is a kind of clearinghouse for all the data sets that these companies need, aggregating API information from a vast sea of feeds and creating verified databases that can be used to populate everything from Foursquare to Yelp. And, it recently rolled out Crosswalk, which will give small businesses a way to market through location data.
Street Fight recently spoke with Gil Elbaz, the company’s founder and CEO, about the importance of data to our hyperlocal future.
There are a lot of data providers out there. What makes yours different?
One thing that’s quite unique about Factual is that we’ve really embraced the notion of open data. We provide data that comes directly from our customers — a lot of them are contractually obligated, or in a lot of cases are really excited about contributing data back to us. We have many, many of these partners who contribute data back to this open data ecosystem, and we’re the managers of these ecosystems. We take in all these feeds, and then in order to leverage these data feeds, we have to clean the data, normalize it, and de-spam it.
How do you verify that data?
There’s a lot of noise that comes in. Especially when you have businesses themselves that provide data through these online funnels. They aren’t always incentivized to give the accurate answer and our job is to try to extract truth out of all these different data feeds that we’re getting our hands on.
We’ve put a great amount of work into the statistics and analytics behind this problem. If we only had one data point on a business, it would be hard to guess if it’s true. You could look at the source, perhaps, and figure out whether to trust it. But what helps is that for many of these businesses where we are aggregating data, we have not just five but in many cases hundreds of reference points. So we also have a system that can validate and augment the data we have by using crawls of the web.
How does Factual make money?
Our model has been to lower the barriers for developers. So, for small developers we are very aggressive at saying, “Let’s just make this free, just start using it.” We set fairly high quotas. Something like 10,000 queries a day where at a certain point a pricing model will trigger a CPM model that is a price per thousand calls.
For many companies that talk to us, small or large, if they’re willing to aggressively share a lot of their data back to us (and be obligated to do so) we’ll further discount the fees beyond that, even sometimes to $0.
There’s so many opportunities to change the game and disrupt the way that consumers make local purchases. Everything from smartphone search to deals and push notifications — all of these new social layers.
It seems like place-based commerce — deals powered by location data — is exploding. How do you see the use of this kind of data evolving?
It’s kind of on fire. There are so many opportunities to change the game and disrupt the way that consumers make local purchases. Everything from smartphone search to deals and push notifications — all of these new social layers.
And with all these startups and projects at larger companies, there are some baseline things they need [to build out their products]. One of the major things they need is a database of all the businesses out there, and they need information like the geocode where these businesses are. They need to set up proximity-based search service so that they can figure out what’s within a quarter mile of a given latitude / longitude. Then they need to start connecting dots. And a lot of them want to offer the ability to link from that data, to another website, to a Yelp. They might need premium data, or additional attributes about these businesses. “Category” is a very popular one. Things beyond the basic phone number.
Are there ways for small businesses to market through this data or use your place in the data chain to influence more people to come in?
The answer is yes, and we’re formalizing that in the first product that helps people get traffic themselves. So, take a small company that wants to compete with OpenTable and wants people to know you can book a table on their site. They want distribution; they want people to know that anytime people are publishing information about restaurant X, that they’d love to see a link to their site as a place to book. What we’ll be offering is a mapping between all of our businesses, and all the other external IDs and services that are offered by other people. With this, a developer can very easily connect the dots. The product is going to be called Crosswalk.
Can business owners take control of their own data ID?
At many of the services available, take Citysearch, or Yelp, they do have tools that let a business take control of certain information about their ID.
We believe we’re facilitating that by making it easier for all of these IDs to get surfaced. In one shot, a business could quickly find all of the hundreds of different sites that allow people to interact with their information. Factual today as a more generic way of looking at this. Right now we don’t offer a way for a business to claim a business. But a business can come in and share information about them, and we will take that into account when we’re trying to extract truth.
Part of the reason we do that is we take a very high-level view of the world and we know that there’s some things that a business does a great job of providing, so we let the business provide that information. We want it to be more fluid and let many more people contribute information than just the business owner.
This interview has been edited for length and clarity.