This is my idea of Node Knockout 2018.
I mulled over it for about two weeks and am highly satisfied with what I
could achieve during the hackathon.
So much, that I bought the domain lobsang.network and
continue developing it on GitHub.
But what is it about?
Analyse websites for Search Engine Optimisation by leveraging the power of
the web itself.
Distribute the effort over the web, make use of specialised services and
coordinate via distributed protocols.
So what does that mean?
Looking at it from different angles:
- What does Search Engine Optimisation mean here?
- What effort is needed to do it?
- What specialised services exist?
- How do they coordinate their work?
Let's dive in!
What does Search Engine Optimisation mean here?
To make sure we are on the same page, here is what I understand under these
What is Search Engine Optimisaton?
Here, Search Engine Optimisation (SEO, for short) describes all measures
taken (optimisation) to improve the ranking of a website or -page in the
Search Engine Result Page (SERP).
In other words: What can you do to make it on site one in Google?
Categories of Search Engine Optimisation
Broadly speaking, there are two categories, where you can optimise:
On-Page and Off-Page.
On-Page describes all measures on your website. Optimising images, change
wording, URL path structure and so on.
Off-Page describes all measures elsewhere. Think getting links from other
sites (so-called backlink optimisation), improve your reputation on Social
Media (generating buzz) etc.
I am focussing on On-Page optimisation for now. For Off-Page you would need
more time to crawl the web, process it and discover trends.
What effort is needed to do it?
My idea is to bring ScreamingFrog SEO Spider to the web.
If you look at their homepage or give their free edition a try, you can see
that it basically crawls a site and looks at different information (links
among your site and external, HTTP status codes, meta description and many
more). No rocket science. They put some charts on top and offer an export
Thinking through the process, we can break it down like this:
- Crawl the web
- Share the link with other processes
- Pick up the link
- So some analysis with it
- Share the results with other processes
- Pick up the results and aggregate them
- Present findings to the user
I can see five distinct areas here:
What specialised services exist?
Those are basically taking a URL and follow it. They get more links and get
their content as well. Really dumb. We should keep this part as lightweight
as possible, so IoT devices can do it, too.
So if we add some metadata we can use any of them. Let's go with this:
- With a license (SPDX identifier)
- With a timestamp (in ISO 8601)
- With a ID to allow immutability
- With a link to the node it was derived from
- With an issuer ID
- With an identifier to describe the topic of payload
- With some payload
This should be all it needs. Then the transport protocol doesn't matter,
If an interesting message is received, it passes along some other service.
This are highly trained services. Say a Natural Language Processor.
Or image recogniser. Or a scanner for potential accessibility issues.
Something like that. By now, you find several of them in the cloud.
A presenter is actually build up of two parts: Some kind of database to
store interesting data and some means to visualise it.
They are actually dumb insofar as they should not have to process the data
any further (except maybe on the writing phase of the database).
How do they coordinate their work?
Currently all of those steps have to be done by every search engine on its
own. Be it Google, Bing, Baidu, Yandex, DuckDuckGo or IxQuick.
Why not share the burden of crawling the web?
Why not having the processing part concentrated?
I would like to see the basis for the building of the search engine open to
all. What you weight against each other can be closed source if you insist.
But we have then the way to compare the input with the output.
This could stir some interest in science. Or somebody comes up with a new
By agreeing on a protocol, we all speak the same language.
Speaking of language: By relying on JSON, we are independent of the
programming language used. I will start with node.js and python.
I can imagine writing some ruby and lua code as well.
Hopefully people will contribute Java, PHP and Perl code.
This way we can smooth rough edges (I'm sure I missed something in the concept!)
For example, I don't whether it scales as I imagine. What about memory
consumption? What about congestion on the database? The message protocol?
On the other hand, by relying on standards I can build upon the work of
Although I haven't won (again) in the hackathon, I am motivated enough to
follow my idea for a few years and see, whether it gets traction.
Will you join me?