Just a few hours before the public release of the portal we work on one of the cluster machines get overloaded by issues no relevant for the point. The fact is once the machine started to not answer properly so it was fenced by some other node, but the problem was once this automatic action was held, the whole 6 machines GFS cluster went down letting all the machines unusable.
This, in addition to all of the previous issues we suffered on GFS, made us really thinking about purging GFS in favor of NFS. It was no an easy decision as it was fully against all our previous decisions but we weren't confident about GFS in the production systems. So we migrated it in a time record configuring everything by night so at six o'clock service would be held properly. And we managed to fulfill this purpose. We made it!
We are now tired after about 27 continuous working hours but the overall result was quite acceptable. I'm still proud about or design (not so much about my own decissions) which allowed us to make this king of changes so quickly.
But you can be sure I don't think we will never again think about installing GFS on any system as it seems not being production suitable (as RedHat even says so). And it is not only because the buggy GFS2 (at least, at present date) but for the sensation of instability all over the time we had it installed.
So in a few hours our architecture has been changed, but it was setup for the very moment we were accepting requests.