In which I explain problems unique to running The Ruby Racer in a multithreaded environment, which users are affected by these problems, and what is to be done for them.
UPDATE: This issue has been resolved with version 0.9.1. You should be able to stop reading here, update your Gemfile, and have a great day. That said, please don’t let me discourage you from continuing down the page. It’s riveting.
There have been a number of crashes reported in several different places for which The Ruby Racer is the culprit. For the most part, these are people running an Rails 3.1 application in a multithreaded webserver such as WEBrick, but as we’ll see, it could be anybody that is using The Ruby Racer in a mulithreaded environment. In order to speak to these and future issues in one place, as well as prevent the spread of any misinformation or impressions, I wanted to very quickly explain what exactly is causing these crashes, who is affected by them and where, and finally how we’re going to make them go away.
eval() a script or you allocate an object with
new, it retrieves that state from the executing thread to perform that operation.
In a single threaded application, that’s the end of the story. This is because every V8 operation happens from the same thread as the rest of your code, so the V8 state is always right there ready at all times. If you’re running your app in a rack server like Passenger or Unicorn that uses a multi-process model to achieve concurrency, you’ll never see this crash because you don’t use threads. Also, if you’re running on MRI 1.8.7 which uses green threads, you will also not be affected. While you may have more than one Ruby thread in your application, you still only have one native OS thread which is what V8 will use.
Luckily, V8 does give us some help here. It provides a locker API as a way to deal with threading issues. It acts as both mutex on a V8 instance while at the same time providing a facility to access it from different threads. We can wrap calls that touch V8 with a
Locker in order to prevent these crashes. e.g.
V8::C::Locker() do #... code that calls V8 end
That sounds simple enough: just never access V8 without a lock and you’re good. In fact, ExecJS does this whenever it invokes its therubyracer runtime, and it works… almost.
I would have gotten away with it too if it hadn’t been for you pesky GCs
If ExecJS locks V8, then why am I still crashing, and at seemingly random intervals?
You might be asking yourself, “why not just lock V8 in your GC routines and have done with it?” The answer is that it would be a terribly dangerous and irresponsible thing to do.
Suppose that thread
A, so it tries to lock V8, but uh-oh, thread
A already holds the V8 lock, so GC may have to wait forever until it becomes available. Deadlock!
The answer is that instead of immediately releasing v8 objects inside the GC thread, we need to equeue them somewhere where they can be released. This probably means starting a thread per V8 instance to consume this reference queue and release them in the context of a V8 lock that isn’t contending for any other resources. While the answer is not trivial, I don’t expect it to be that complex.
What to do? What to do?
In the mean time, there is a workaround for locking V8 with each request, which will ensure that GC running in that request will have already acquired the V8 lock. Bear in mind that this is not a silver bullet, and you still could encounter crashes and deadlocks, but they should be very few and far between. If, as is the case with most people, you are just using a multithreaded server in your development environment but deploy to multi-process model, then this should be sufficient. If on the other hand you run multi-threaded in production, then you should definitely not be locking V8 in your middleware since this will effectively synchronize every request. Not a good idea.
No matter which category you fall into, I expect that we’ll have a beta fix by the end of this week, and if all goes well, a patch release after another. This may seem like a long time for such a small thing as coordinating locking with GC, but my experience tells me that threading is not a thing best done by humans, and so there will inevitable stumbles and cursing.
Also, I’m keen to make sure that any solution is both invisible to you the developer, as well as compatible with other Ruby versions and implementations. This deserves some thought.
What? you’re still here?
Alright, time to stop talking and get started with the repairs. I just wanted to take a moment to let you know what the problem was, why it was happening, how’re you’re affected, what you can do now, and when you can expect a more permanent fix.
blog comments powered by Disqus