Thursday, July 19, 2007

Damn GFS

Just a few hours before the public release of the portal we work on one of the cluster machines get overloaded by issues no relevant for the point. The fact is once the machine started to not answer properly so it was fenced by some other node, but the problem was once this automatic action was held, the whole 6 machines GFS cluster went down letting all the machines unusable.

This, in addition to all of the previous issues we suffered on GFS, made us really thinking about purging GFS in favor of NFS. It was no an easy decision as it was fully against all our previous decisions but we weren't confident about GFS in the production systems. So we migrated it in a time record configuring everything by night so at six o'clock service would be held properly. And we managed to fulfill this purpose. We made it!

We are now tired after about 27 continuous working hours but the overall result was quite acceptable. I'm still proud about or design (not so much about my own decissions) which allowed us to make this king of changes so quickly.

But you can be sure I don't think we will never again think about installing GFS on any system as it seems not being production suitable (as RedHat even says so). And it is not only because the buggy GFS2 (at least, at present date) but for the sensation of instability all over the time we had it installed.

So in a few hours our architecture has been changed, but it was setup for the very moment we were accepting requests.

18 comments:

SEJeff said...

This is heresay and might be true and might not be, but it came from a reliable source of mine.

One of the guys at redhat who works on GFS said OCFS is better at being a distributed filesystem. Also, it is in mainline.

Pablo S. Torralba said...

Indeed, RedHat provides no support on their own RedHat cluster suite while as you point out, OCFS, despite not so quick, seems more reliable and it has the warranty of being included in stock kernel (but not in RedHat ones).

Anonymous said...

Pablo,

Have you thought of opening a call with the support help desk at Red Hat to ask for assistance ?

If you cannot do it on your own after 27 hours it is time to call the people who can help you solve your issues.

It is not going enough to "just" complaining on a list about an issue when you have not even bothered to seek assistance.

In regards to ocfs2 do you have any proper evidence to back-up your claims or it is just words of mouth ?

travellig

Anonymous said...

T8n7YN Your blog is great. Articles is interesting!

Anonymous said...

9dTPji Thanks to author.

Anonymous said...

Good job!

Anonymous said...

Good job!

Anonymous said...

Magnific!

Anonymous said...

actually, that's brilliant. Thank you. I'm going to pass that on to a couple of people.

Anonymous said...

Hello all!

Anonymous said...

Wonderful blog.

Anonymous said...

actually, that's brilliant. Thank you. I'm going to pass that on to a couple of people.

Anonymous said...

Nice Article.

Anonymous said...

ozDI8i Please write anything else!

Anonymous said...

Hello all!

www.ventaxcatalogo.com said...

Pretty effective info, thanks for the post.

Anonymous said...

Greetings from Ohio! I'm bored to tears at work so I decided to browse your site on my iphone during lunch break.
I enjoy the information you present here and can't wait to take a look when I get home.

I'm surprised at how quick your blog loaded on my cell
phone .. I'm not even using WIFI, just 3G .. Anyhow, wonderful
site!

Also visit my blog buy golf shoes melbourne - http://Ijsdio.com/special-price-scotty-cameron-by-titleist-studio-select-laguna-1-5-2011-right-35-inches/10862.html,

Anonymous said...

Alveo
Heya i am for the primary time here. I found this board and I
in finding It really useful & it helped me out a lot.
I am hoping to give one thing again and aid others like
you helped me.