Behind the Scenes: Building a Dynamic Instrumentation Agent for Python
Building a Python Dynamic Instrumentation Agent is no easy task. At Sqreen we’re building an agent based on dynamic instrumentation in order to detect and block security issues inside an application without requiring code modification in order to be as transparent and frictionless as possible.
The high-level behavior of our Python agent is no different than the Ruby one that we described last week on our blog.
Instrumenting Python code
Instrumenting a Python program is safe and reliable thanks to the high-level of reflexivity built into Python itself. Replacing functions and classes seems pretty straightforward once you get past the complexity of the import mechanism.
At the latest PyCon-Fr we gave a presentation about the challenges of replacing functions, classes and methods dynamically in production:
The basic idea behind instrumenting a Python function or method is to dynamically apply a decorator and execute business code before or after the call or to raise an exception if the call fails. Depending on what you want to instrument, you’ll have different strategies.
Instrumenting or monkey-patching Python code is not easy because we only replace references, as exposed in this snippet:
When replacing a reference to an object, like
os.system, you can run into the problem where if another piece of code holds a reference to the same object the reference will not be updated.
Luckily, PEP 302 proposes a way to monkey-patching a module before any piece of code can obtain a reference called “import hooks’. Import hooks are called when a module that hasn’t been loaded yet is imported and can take the responsibility to import it (and potentially modify it) or let the default import mechanisms take care of it.
Import hooks are defined as follows:
Import hooks in Python 3 are bigger, but each class only has a single responsibility.
The key part of this code is the following line:
module.function = patcher(module.function)
It will dynamically decorate function with patcher decorator and replace the reference attached to function in module as in the following:
When the import hook is executed, it will become the first and only holder of the module’s reference. Thanks to this property, it’s the only place where we can alter the module objects.
Instrumenting Python defined functions
Instrumenting Python defined functions with an import hook is easy. In order to instrument os.system for example, an import hook must be created for the os module. Once it’s loaded dynamically, it replaces the reference to
system with the Agent decorator.
Instrumenting C-defined classes
It is harder to instrument C-defined classes because there is no dynamic way to set attributes on them. For example, when instrumenting sqlite3.Cursor.execute, trying to alter sqlite3.Cursor will result in errors like “TypeError: can’t set attributes of built-in/extension type ‘sqlite3.Cursor’”.
The solution is to rewind the instantiation chain to find the first Python defined function that can be instrumented. Luckily every SQL driver follows the DBApi2 defined in PEP 249, so the first function a SQL driver needs to implement is connect, which is pure Python. In order to instrument sqlite3.Cursor.execute it needs to be patched all the way from sqlite3.connect:
The gist above shows all the code required to patch sqlite3.Cursor.execute. An import-hook for the sqlite3 module is created, which replaces the connect function with our custom one when executed. The custom connect returns a CustomConnection proxying the real Connection, and the CustomConnection returns a CustomCursor when calling the cursor method which will have its execute function instrumented.
Instrumenting a C-defined class which doesn’t follow the DBApi2, like a NoSQL Driver, would require at least the same amount of work.
Instrumenting web frameworks
Instrumenting web frameworks is easier than instrumenting C defined classes, the hard part is finding how and where to instrument them. The simplest way to do it is to use the framework’s middleware by dynamically injecting code where the Python agent starts. It’s important to instrument the earliest function that loads the middleware in the application to ensure that the agent doesn’t inject the same middleware twice.
The framework’s middleware looks like this:
The notables difference between the two implementations are:
- Django process_view is not called when no view matches the URL.
- Django and Pyramid directly pass the request to the middleware while Flask doesn’t.
- Pyramid’s middleware looks more like a decorator than other middlewares.
Allowing arbitrary callbacks
Using the instrumentation techniques detailed above, we can modify a module at import time. But what if we want to have more than one instrumentation callback on the same method, or if we want to remove a callback later in the application lifecycle? It would be inefficient and dangerous to alter the module, ie. to add or remove callbacks.
The solution is to store callbacks in a method specific list. The only instrumentation logic we need to inject at import time will execute callbacks stored in the list and can be callback agnostic. Access to this list is critical: adding and removing callbacks should not interfere with the execution.
Special care must be given to the way callbacks are executed if the callbacks are arbitrary than no assumptions can be made about them. There is a chance that a callback, during its execution, makes use of an instrumented method (e.g., a log method is instrumented, but the callback needs to log something itself). Without proper safeguards, this could trigger an infinite instrumentation loop.
The callbacks can be set in 3 different positions:
||prior the instrumented function
||after the instrumented function
||if the instrumented function fails
|Arguments (in addition to class)
High-level overview of the callback execution:
Recovering from callback errors
As you can see, this instrumentation scheme adds very little code besides the actual business logic. From a reliability point of view, this business logic is very sensitive and should not block or crash your program.
The agent code uses a very defensive design, each time a callback is called it’s wrapped by a try / except. If an exception occurs in a callback, the program proceeds directly to the original code. The callback exception is not raised, but it can be logged or even sent to a specific endpoint for further analysis.
The callback machinery is all pure Python standard library. It doesn’t involve:
- Locking (only checks, locks are just set when callbacks are added or removed)
These characteristics make this code layer very efficient.
The first time a callback needs to be set for a given method, the agent replaces it with the appropriate instrumentation strategy. Then the callback is added to a callback list related to the original method.
Later on, if a new callback needs to be set for the same method, the agent will detect that this method is already instrumented and will only add the callback to the callback list.
De-instrumenting – back to the origins
De-instrumenting a python function is not quite as straightforward. As we don’t know who has a reference to the instrumented python object, we can’t replace them. The only thing we can do is to empty the list of callbacks so no business logic is executed anymore.
The information computed by the agent (e.g. statistics) can be sent to the outside world in a performant and robust way. It should be stripped of any sensitive data, since adding an external network access inside a callback would slow down the original code too much. Communication should be performed asynchronously. Each time the agent has information to transmit, the information is sent to a local queue and the only overhead is a Queue.put call.
The next step is to transmit the data in the queue to a remote server. In order to make it as lightweight as possible, a dedicated thread is used. This thread is started at the agent initialization, and all it does is wait for the queue to be populated, with a Queue.pop – this call is blocking as long as the queue is empty. It uses no active resource as long as the queue is empty, but as soon as an item is received from the queue and the thread gets to run, the item will be sent to the remote servers.
This implementation is straightforward since it also relies on standard Python objects: threads and queues. They have been built to work together, in a performant way.
Shaping the data
There are many reasons to post-process the data gathered by the agent. A common use case is data privacy. For example, a logged SQL query will be stripped from any strings or integers containing business data. Another example is exception logging where you only need to transmit certain kind of variables.
The data could also be aggregated so the agent computes average response times rather than sending all of the response times.
The data thread does two things:
- I/O (since it sends data to the remote servers);
- Waiting for the queue to be populated.
While waiting, which is the main behavior, it allows the Python application to run computation on other threads. Hence this thread has a very limited impact on the original code performance.
This thread is used for concurrency – not parallelism since it relies on I/O to trigger the thread context switch. This means that the Python Global Interpreter Lock is not an issue in this use case.
There are two cases where this data recording scheme can become an issue.
If a client is in a situation where a lot of data need to be sent, the thread may spend too much time doing I/O when sending data to a server.
To work around this issue, the agent should not send every event as soon as they arrive but it should batch the events in the agent thread and send the batch at regular intervals or if it grows too large.
Queue filling up too fast
But what happens when a queue can’t be consumed fast enough? This can occur for many reasons, e.g. if the network is down, or if the Python interpreter does not switch to the thread for some reason. In this case the queue fills quickly, raising the memory usage.
A solution is to use a capped queue. This limits the queue so it only stores a fixed number of events, placing a hard limit on the memory use. The older events are discarded as new events enter:
In this post, we’ve described the high-level concepts of an industrial grade instrumentation agent. Be careful with implementation details though, as this is what makes the difference!
Instrumentation agents can be used to handle many different tasks, including performance monitoring, error monitoring or security.
At Sqreen, our Python agent leverages instrumentation techniques in order to protect Django, Flask and Pyramid applications at runtime from security threats. Sqreen helps developers get full visibility of and protection against security events. Cyber-attacks are blocked at runtime without traffic redirection or code modification. Suspicious and fraudulent activities from/targeting user accounts are identified to detect attackers early.
Feel free to ask questions if you want to know more about our Python instrumentation or how we do it at Sqreen!
About the Author
Boris is a true Python addict. Boris enjoyed working on scalability issues of a machine-learning infrastructure in the past. He is also a SaltStack lover, and you will probably meet him in various meetups!