• 3 Posts
  • 158 Comments
Joined 2 years ago
cake
Cake day: June 2nd, 2023

help-circle

  • I think you’d be right that the direct cost of running the crawler and index would not be the issue. But fighting SEO to keep your results decent is probably a cost that dwarfs the basic technical cost of running the crawler and index.

    And you’d need a technical security team on top of things as link farms aren’t your only risk, I’m sure there are countless ways to manipulate the algorithm to put your site on top that Google probably have multiple teams working on fighting it full time.

    Many of these things would likely not be a problem for a startup, though. No one is paying SEO firms big money to get into a search index no one has heard of and hardly anyone uses, so these costs probably grow exponentially over time as you become more well known.


  • I’m not disputing that you might be right, but the internet archive runs a very different service. Mainly that Google needs to continuously prune their 400 billion page index because of link rot. The Internet Archive has the opposite aim, they are preserving sites that no longer exist.

    I’m also not sure they even crawl. Do sites get added on user request? When looking at a medium popularity page, you see it only has a couple of scrapes a year.

    None of them. At least, none that I’m aware of. I just don’t think that direct expenses are the reason that there are are only two major web search tools. I also don’t think Google and bing are good examples to point at when estimating the cost of running a complete search engine.

    I would suggest direct expenses are the barrier, but perhaps crawling is not the main expense. I would be interested to know any speculations you have outside of expenses that cause a barrier?


  • That website claims they add 3-5 billion pages a month. Google is doing that in a day or three, as recency of information is very important in search. Plus that site claims 100 billion pages to Googles 400 billion. It’s still an impressive project.

    Size isn’t everything, so the real question is: what search site uses only the common crawl index and has results on par with bing or google?














  • How far away is the Home Assistant Voice Preview from what you’re looking for?

    It doesn’t plug directly into the wall but instead uses a USB C cable (that you provide). Other than this, mine can answer questions, search the internet, turn things on and off, play music via Spotify, Jellyfin, etc. Tell me about the state of stuff in Home Assistant (temps in rooms, how the solar is doing, what’s on my shopping list and can add things, etc).

    It requires you already have Home Assistant set up but it is a pretty good experience so long as you’re willing to do some amount of tinkering to make it your own.

    Like other comments say it’s not general public ready but it’s pretty close and costs $69.