Behind the Scenes: Building a Dynamic Instrumentation Agent for Node.js

TL;DR

Building a dynamic instrumentation agent for Node.js is a complex challenge. At Sqreen, we provide a powerful security tool for development teams using Node.js. You will be able to spot user accounts attacking your app and improve your code by fixing vulnerabilities with the stack traces that we provide.
This protection is based on an agent (which is no more than a regular Node.js package that can be installed from the NPM repository).

There are many advantages to dynamic instrumentation:

  • Extremely fast to setup since there is no source code modification
  • At runtime, the program is fully loaded and can be fully observed, including all third party libraries
  • Whatever the code, when instrumentation is involved, the functions can be hooked. Changing source code, or using different versions of it (dev, staging, prod) is transparent.

To build such an agent for Node.js, the following topics need to be assessed:

  • Code instrumentation: how is it possible to monitor an application’s behavior and to prevent exploitation of security issues?
  • Data transmission: how to accurately report what happened in the application to a server without impacting the performances?

This article will describe of such instrumentation can be built in Node.js.

Instrumenting Node.js code

Instrumenting Node.js code is made easy thanks to the flexibility of JavaScript. A method can be wrapped then replaced by the wrapper. Yet building a wrapper is complex on some versions of Node.js with different versions of APIs. For instance, the 4.x version has the class keyword, but the Reflect library is not available.

The permissive module lifecycle makes dynamic instrumentation tricky. If someone takes a reference on a method, it will not be possible to update this reference. However, if the reference is taken on an object, the members of this object can be.

The overall dynamic instrumentation process can be summarized as follow:

  • When a module is imported (using `require`), a reference to this module is kept in a private store.
  • When the instrumentation instructions are available:
    • The module to be instrumented is retrieved from the private store.
    • The methods that have to be instrumented within the module are overridden with a general purpose wrapper.
    • Instrumentations hooks are placed on the wrapper.

Let’s detail the aspects of this process:

Keeping track of imported modules

Each time a module is imported in Node.js (using the require method), the Module._load method is called. To keep track of the modules that are imported into a Node.js application, this method can be overridden.

The keepTrackOfImport method will place the imported module in a private object. It needs to know the request string (this is what is passed to require) and the parent (the module from where the import is done) in order to get the unique identity of the imported module using the method Module._resolveFilename.

Overriding Module._load this way has little impact on the application:

  • The original method is called as it would have been without the override thanks to `load.apply(this, arguments)`
  • Since what happens during a `require` is synchronous, the performance loss is small compared to the original operation.

Building a general purpose wrapper

The hooking logic has been applied to keep a pointer on each imported module. This will allow overriding methods inside the module at runtime if needed. The point here is to only override methods whose behavior must be controlled.

General overriding is a complex problem in JavaScript: a function can have a different form:

  • A normal function (declared using the keyword `function`, the `Function` constructor or any other historic way to do so).
  • An arrow function
  • A class

Historically, function wrapping could be achieved using a simple piece of code:

This method will not work anymore since class have been introduced to Node.js. Calling a method defined using the class keyword with apply will throw an error:

Results in:

TypeError: Class constructor X cannot be invoked without 'new'

And at the same time, an arrow function cannot be called with new.

A modern generic function wrapper must be aware of how it has been called to call the wrapped method in the same way. Thankfully, ES2015 provides reflection tools to build such a wrapper.

Such an example is available in Node.js util library:

The sad part here is that tools like Reflect or new.target are not available in Node.js 4.x whereas class and arrow functions are.

Since native modules (written in C) generally provide a JavaScript API, they can be instrumented as any other module by wrapping the public JavaScript methods.

 

 

Placing execution hooks

Once a method is overridden, the wrapper can place methods (named callback functions) to be executed before or after the original one. This allows modifying the behavior of a method dynamically at runtime.

Such a wrapper would look like that:

Three hooks have been placed in this wrapper:

  • One before the execution of the original method.
  • One if the original method throws an error.
  • One after the execution of the original method.

Adding complexity to this method can be done to add actions from the result of the executions of the hooks. For instance, the preHooks could prevent the execution of the original method.

Robustness

The execution of the hooks must happen in a fail-safe environment, i.e. within a try-catch statement. Also, all promises used within a hook must have a catch statement. This will prevent the hooks to through uncaught exceptions that will potentially crash the process.

Performance considerations

The cost of the described methods is not very high in term of performance:

  • The import of a module in Node.js is a synchronous task, adding a small operation to keep track of imported modules is negligible.
  • Since the instrumentation is dynamic, only a few methods are patched: the impact over the application is tiny.
  • Hookpoint execution time will slow down the call to a method, it is unavoidable. Therefore the code placed here must be carefully tested and optimized in order to have the smallest impact possible. Some advanced performance trick should be used here, Some of them have been described in my last article on RisingStack’s blog.

What’s different from other instrumentation libraries?

Instrumentation libraries as NewRelic or Opbeat will practice static patching. It means that the list of modules and methods to instrument is known at the startup of the application.

The methods to instrument are overridden when the module they belong to is loaded.

The main difference with the approach presented in this article is that if the instructions regarding the instrumentation of a module are not available in the current version of the instrumentation library, it cannot be instrumented. That is why those libraries have a pre-defined list of supported modules. With dynamic instrumentation, it is possible to patch modules during runtime even if they did not even exist when the instrumentation library was published.

 

 

Data transmission

Delaying actions

Node.js is a single threaded platform, this means it can only do one thing at a time. When building an instrumentation agent, one needs to take that into account when designing the reporting chain.

Spending too much time manipulating data will introduce large synchronous chunks of operation that will effectively impact the server’s performances.

The use of timer methods such as setImmediate can be a good idea: The reporting chain will be divided into a set of asynchronous operations until the data is ready to be placed into a reporting queue.

Node.js thread

setImmediate allows us to introduce asynchronicity in a reporting chain. Between each operation, the server will handle other tasks.

Node.js async thread

Reporting queue

It goes without saying that the action of reporting some data to a remote server through an HTTP POST request is an asynchronous operation. However, it is not recommended to directly send a lot of requests when there is data to report. Using a reporting queue can be a good idea here.

The reporting queue is nothing but a set of data that need to be reported to a remote server. It is a FIFO (First In First Out) queue. The interest of such object is that one can decide to report a batch of data.

For instance, if 50 metrics reports stand into the queue, rather than transmitting each of them in an individual HTTP POST request, a batch could be built with a subset of the queue and sent to the server.

This method allows reducing the impact of the reporting chain on the performances of the server. Reports happen less often. Therefore, they consume less memory and fewer network resources.

During a period of huge load for the server, the number of items pushed to the queue can rise exponentially. In order to prevent data leaks, the queue length can be limited and the supernumerary items can be dropped.

Conclusion

In this article, we saw how to build an agent to perform dynamic instrumentation in Node.js.

The key parts of such agent are deeply tied to the nature of the Node.js platform:

  • The code instrumentation must be done taking into account that JavaScript is a permissive language that allows a lot of different coding patterns that ca introduce an infinite number of different behaviors.
  • The reporting of data should not impact the performances of the monitored apps. In a mono-threaded environment, the execution of synchronous computing tasks should be spread through the event queue.

Fell free to ask your questions about Node.js instrumentation or Node.js Security. I’m always happy to help. Check out other articles I wrote on this blog about Node.js and keep your app safe by using Sqreen!

 

About the Author

Vladimir de Turckheim is the Node.js lead engineer at Sqreen.io and was previously expert in cyber-security. He is involved in various open-source projects in JavaScript, mostly within the hapijs project.