How We Made Daily Malware And Vulnerability Scanning Free For All Websites
As I start two write this, it is 6pm in Seattle and getting close to the end of launch day for the Gravityscan badge program. And I am really happy. We already have over 541 websites that have installed the Gravityscan badge and are getting free daily monitoring, in just a few hours. This has been a long journey for our team. Along the way we created and used break-through technology, and completely changed the game on security companies that are overcharging for malware and vulnerability scanning.
Today we launched a program that gives any website completely free daily malware scans and includes blacklist checks and content monitoring. And if you want to do a deep vulnerability scan on your website, including source code scanning, that is completely free too! That is amazing when you consider that many vulnerability scanners cost thousands of dollars a year per license.
I want to share the story with you of how we created technology that enables us to provide free malware and vulnerability scanning for thousands of websites without spending millions on server hardware and hosting. Much of this story is about how we are standing on the shoulders of giants. It all starts with a big shift in technology that happened a few years ago.
The Germ of An Idea
For the past five years our core business has been Wordfence.com. Wordfence has grown up to be a world-class firewall and malware scanner and is the most popular security plugin for WordPress today. Wordfence is an amazing business and today it supports a team of over 30 talented people and it continues to grow.
I have the privilege of working with two amazing executives – Dan and Kerry. I think we make a great team and about a year and a half ago we started exploring “crazy blue sky” ideas that could be completely ground-breaking.
I think walking onto the floor of the RSA conference was an eye-opener for us. There were over 800 vendors and they were all selling very similar products and the main difference was the claims they make. We felt there was an opportunity to change the security industry. Things just felt a bit too uniform.
One thing we noticed is that vulnerability scanning products in particular were expensive. Consistently, when you visit a vendor’s website and want to try out a product, you hit a “Request more info” form which asks you for information that sizes you up as a customer and then the sales process starts. Basically that is a sales person trying to figure out how much money they can extract from you.
That seemed weird in a world with self-service applications – and where so many open source companies have customers that have been using their products for years before there is any kind of commercial relationship.
What also seemed a bit weird about the vulnerability scanning market is that many, sometimes large companies, are using old technology. You “schedule” a vulnerability scan, there is no interactive feedback and then you come back a few hours later hoping it’s done. With the technology we have today – and browser technology in particular, it was clear that legacy security companies were selling legacy technology that they weren’t investing in. And they were raking in the profits.
Changes in Technology Enable Awesome
There were two fundamental changes in technology that enabled us to create Gravityscan and change the way vulnerability scanning is done. The first allowed us to make the scan completely interactive. The second reduced our operational costs to a fraction of what they would have been a few years ago. This enabled us to make Gravityscan free.
Back in 2007 I created Feedjit.com which showed your website visitors arriving in real-time. This was way before Google Analytics added real-time. The trouble was that the technology back then didn’t exist to push events to a web browser in real-time. So many developers, myself included, created our own web servers and used a technique called “long polling”. It was this clumsy hack that allowed us to send messages to a browser in real-time. In my case and the case of Feedjit, a visitor would arrive on your website and a notification would appear in your live traffic feed.
The way long polling worked was as follows: A web browser would connect to the Feedjit web server. Instead of immediately sending back a response, the web server would just hold the connection open and say nothing. Much like the silent treatment – in case you’ve ever experienced that. When a message arrives, the web server sends the message and closes the connection. The web browser opens the connection again and the process repeats.
Long polling is horribly inefficient because with every event there are request headers and response headers and three way TCP handshakes and all this other junk that needs to get transferred and adds latency. The fact that we no longer have to do this makes me truly happy that I’m living in the year 2017.
Around 2008, websockets were invented and Google Chrome was the first browser to include websockets in 2009. Websockets allow a web browser to open a persistent connection to a server and listen for events. The connection just stays open and events can arrive whenever they like. The connection isn’t closed and there is no header overhead or the latency of reestablishing connections.
Websockets gave developers a way to make web applications truly behave like desktop network applications. They could receive events as they happened in real-time and quickly display them to a user.
Websockets gave us a way to make Gravityscan behave like desktop software when you perform a vulnerability scan. No more scheduling scans with a click-and-pray user interface. Instead you can hit the scan button and watch the scan happen in real-time with a live progress bar and live status updates. Vulnerabilities and other scan results even appear on screen in real-time.
Websockets empowered us to create a web application that does vulnerability and malware scanning that is as responsive as a locally installed desktop application, but that has the power of a data center and powerful high-bandwidth back-end servers behind it. And it all happens in your browser.
Solving the C10K Problem
Back in 2000 I worked for eToys.com in the dot-com boom. Before eToys became one of the biggest dot-com busts in history, they were actually a very busy website. At any one time they usually had a few thousand people browsing the site shopping for toys. This required a cluster of several hundred web servers to handle that number of people.
Each person browsing the site would make connections to the web server as they requested documents. Each connection was handled by a single process. That process could not do anything else until it had finished serving the browser connected to it. Some of the requests would take some time. For example, database requests might take a few extra milliseconds. So if you wanted to have lots of people being served promptly, you had to have at least one process on each server for each visitor.
If you wanted to calculate how many servers you need for the holiday toy rush you did something like this:
Number of people divided by the number of processes each server can support equals the number of servers needed.
Back then we used mod_perl and we had around 200 processes per server at a push. So if you expect a rush of 20,000 visitors on the site, you need a cluster of 100 physical servers with enough memory for each of those processes. And server memory is expensive!! At $3000 per server that’ll cost you $600,000 just for your front-end web servers.
Nginx leading the way
In the early 2000s, work was being done on various operating system kernels to try to provide a very efficient way for developers to manage over 10,000 concurrent connections with very little CPU or memory usage. This was referred to as “cracking the C10K problem“. (C10K is developer speak for 10,000 concurrent connections)
In 2002, Igor Sysoev started work on Nginx to try to solve the same problem. By 2007 the Linux kernel developers had cracked the problem and Nginx was using new features in the kernel to provide a web server that could easily handle over 10,000 concurrent connections with a single process.
To put that in perspective using my eToys.com example above: To handle 20,000 concurrent connections you no longer needed $600,000 dollars. You needed just $3,000 for a single server. It was a monumental and historic breakthrough in technology.
In late 2007 I had launched Feedjit. By early 2008, Feedjit was growing out of control and I needed a web server that could handle thousands of connections and I didn’t have much money to spend. Nginx saved my bacon. I used it as a front-end server to replace Apache and I couldn’t believe what I was seeing. I went from a large number of processes that were very memory hungry and about to cost me a fortune to a single lightweight 8 megabyte process using almost no CPU.
Thank you very much Igor and the many kernel developers and engineers whose shoulders we all stand upon.
Enter The World Of Asynchronous Programming
With the C10K problem comprehensively solved and Nginx leading the way starting in 2007, all of a sudden developers realized the power of managing thousands of connections with a single thread or process. It opened up worlds that were previously inaccessible. For example, if Nginx can manage over 10,000 incoming connections with a single process, surely we can initiate 10,000 connections with a single process that do things like crawl the web.
The First Gravityscan Prototype in Node.js
In early 2016 I realized that Node.js might be perfect for creating an incredibly powerful malware and vulnerability scanning platform. Node could initiate and manage thousands of concurrent scans with very little memory or CPU, which meant that we could scan thousands of websites simultaneously with a single process.
I knew it would take more than just Node, but I got to work on a prototype scanner to prove that we could use the breakthroughs in technology described above to create a very powerful scanner that uses very little operational resources.
Once I completed the prototype and proved the concept to the team, they politely ignored me and got to work building a much more powerful system.
The Gravityscan Platform
The engineering team behind Gravityscan started by creating a powerful and highly scaleable malware and vulnerability scanning platform. We use leading technologies that leverage some of the tech breakthroughs I have described above. These include:
- Nginx for front-end load balancing.
- Node.js for opening and managing a massive number of network connections.
- RabbitMQ, a message passing server written in Erlang, for moving data around Gravityscan in the form of messages and doing it very quickly.
- Ember Fastboot, which runs on Node.js, for pre-rendering pages which makes the user interface very fast.
Once the architecture was complete, this is what Gravityscan looks like. I’m sharing one of our engineering diagrams, so you are getting this completely raw and un-edited by a marketing team.
I’m going to talk you through the basic flow from browser to scan to help explain the above diagram.
When you initiate a scan, your browser connects to one of our websocket servers which keeps a connection open that we can use to send you events. These are actually Node.js servers that run a application that handles websocket connections and events. Whenever you want to send a message to Gravityscan or we want to send a message back to you, it goes via one of the Websocket servers.
Message passing is how data gets around in the Gravityscan architecture and everything goes via a message broker, which is the green square. We use RabbitMQ. In our message broker, we have three queues for different kinds of events. When you start a scan, for example, your message goes via a websocket, through our websocket servers, into a queue in the message broker and arrives on a “Scan worker”.
Our scan workers are where the magic happens. Each scan worker can handle a large number of concurrent scans – meaning that each worker only takes the memory and CPU of a single process, but can initiate thousands of network connections and scan many different websites simultaneously. As you can see we have several scan workers in the diagram. We actually have many scan workers distributed across different servers and we can easily scale up by simply adding more workers using cheap hardware. We can easily add new vulnerability or malware detection to workers by adding a new plugin.
Scan workers send their results as events back to your web browser via a websocket, which is how you see results appearing in real-time. They also send results back to a storage worker which stores your scan results in a database. The storage worker is actually a cluster of workers, which makes it highly scaleable.
The application server is PHP running the Laravel framework, and it contains the business logic of Gravityscan.
Using this architecture gives Gravityscan some amazing capabilities:
- We can do scans in real-time and send the results to your web browser as they happen.
- We can simultaneously store your results for future reference.
- If a scan worker fails for some reason, we automatically resubmit your scan job and it continues to completion without a problem. This actually happened on launch day and saved us from some down-time.
- We can easily scale up to serve millions of customers and to scan millions of websites.
- Every application component can scale horizontally by adding more servers.
- We can easily use this architecture to add a lot of additional functionality. Gravityscan is amazing, but there is so much more to come.
Breakthroughs We Needed To Make Gravityscan Happen
I would say the most important technical breakthroughs (the shoulders of giants we stand on) that enabled the creation of Gravityscan are:
- Solving the C10K problem and and having the ability to manage tens of thousands of connections using commodity hardware.
- The creation of Nginx which can handle a huge number of concurrent connections at the front end on commodity hardware.
- The creation of Node.js which gave us an application server that can initiate and manage a large number of concurrent connections and uses a language that is highly accessible.
- RabbitMQ which is incredibly fast at message passing.
- Ember Fastboot which allowed us to create a highly dynamic website which is also very fast and loads very quickly.
If even one of these were removed, it would make our lives significantly more challenging.
Paying it Forward: Passing On Savings To Our Customers
All the technologies I have mentioned in the previous section are open source. That means they are projects created by volunteers who have freely donated their code to the community and licensed that community to use that code in their own projects.
Vulnerability and Malware scanning is expensive right now. The technical innovation I have described in this post has enabled us to massively reduce our operational costs. We don’t need a cluster of thousands of machines to operate Gravityscan. Instead, we use just a handful.
We are paying it forward by making daily malware scans, blacklist checks and content monitoring completely free at Gravityscan. Manual vulnerability scanning is also completely free – just visit our home page and start a scan. The only thing we charge for is faster scans (it takes more server resources) and scheduled vulnerability scanning (same). We can make all of this free because these technology breakthroughs allowed us to reduce our own costs and we can pass those savings on to you.
At the beginning of this post I mentioned we had 541 websites who have installed the Gravityscan badge and are benefiting from free daily scans. In the time it has taken to write this post, we are now up to 778. Our team is incredibly excited about Gravityscan and the badge program. If you have any questions or comments, please don’t hesitate to contact us with your feedback. We’d love to hear from you.
Mark Maunder ~Wordfence Founder & CEO.