Ask anyone to name the first search engine that comes to mind and there’s a high likelihood they’ll say Google. It hasn’t always been this way. Over the past three decades, the answer to that question has also been Yahoo, Ask Jeeves, WebCrawler, AOL Search, or Netscape. British search engine company Mojeek is hoping to be the next name that web surfers turn to, and is building its own crawling technology to challenge Google’s hegemony.
“The monopoly Google has over search, it’s not healthy,” says Colin Hayhurst, CEO of Sussex-based Mojeek. “Can you imagine if we got all our news from the New York Times, Washington Post and that was it – do you want two US sources?”
Mojeek plans to take on Google’s search dominance by building its own catalogue of internet content in a way that requires as little data collection as possible.
But with Google’s search engine market share standing at around 90%, and Microsoft’s Bing the next most popular choice, Mojeek faces a mountain to climb.
Search engines are responsible for scanning web for pages (crawling) and then storing what it finds (indexing). Crawling is the “relatively easy part”, Hayhurst says, but indexing required Mojeek to rewrite its whole software architecture as it became too slow at around one billion pages. Mojeek to date has crawled and indexed six billion pages.
Mojeek was founded in 2004 by Marc Smith, initially as a personal project after becoming frustrated at the direction Google was headed. It’s the same year that Google acquired Gmail, “a move, presumably, to collect more information,” jokes Hayhurst.
Mojeek has raised just over £3m in angel investment – pocket change to Google. Its main revenue stream is through licensing its search API to businesses, such as publishing companies. It also offers a site search API and ads.
Its core mission is to give consumers another choice when it comes to searching the web – and one that doesn’t track users.
“Information diversity is important,” says Hayhurst, who has a history of leadership roles at web infrastructure startups. “We’ve got one or two places where we are going to discover information on the web, to navigate across the web, to transact on the web, decide what businesses we want to deal with, and we’ve actually got two US companies deciding who you are going to see on the first page. It’s really unhealthy.”
Mojeek: ‘We are anti-personalisation’
Where Mojeek wants to differentiate from the other players in the space is privacy. Hayhurst says that privacy-led search engines such as DuckDuckGo and Ecosia “aren’t search engines” and that “most of them are actually proxies for Bing”.
“Most of them take your search query and send that search query off to the Bing API to Microsoft,” adds Hayhurst. “Bing then sends back the search results and adds relevant ads.”
Yahoo used to have its own independent engine, but that too uses Bing now, Hayhurst says. Mojeek’s ethos is that a true privacy-led search engine should index its own pages. But doing so is a mammoth task. Hayhurst is unable to accurately reveal what percentage of the web Mojeek has indexed as the total number of web pages is unknown.
The privacy focus is the reason why Mojeek builds its own servers, which are located in a secure room at a data centre in Kent. Mojeek has 313 servers that house 666TB of storage, with a further 86 servers to be switched on before the end of the year.
Mojeek is currently working on a contextual ads product, maps and a news service.
In its earlier days, Mojeek received a lot of medical-related queries. According to Hayhurst, Smith thinks this was because users did not want to search for sensitive topics on Google.
Mojeek says it only collects necessary information during search countries, which is country, language and IP address, which is converted into a country code. It doesn’t have a sign-in option, so searches cannot be linked together.
This is an example of the trade-offs between convenience and privacy, which Hayhurst acknowledges. But he adds that once search engines enter this “desolation”, they end up collecting more and more data for user customisation in an endless cycle.
“If two people are doing the same search query in the same location, at the same time, with the same language, then in our opinion they should get the same results,” says Hayhurst. “We are anti-personalisation, as I think that leads to all sorts of toxic side effects.”