Super-Networking Blog

When you don’t know who needs to fix it

by admin on Aug.22, 2006, under Systems

We have running into a problem lately at the company I work for where there is a problem, a big one at that, that no-one can seem to find the cause. What do you do when you have a random chain effect problem? We have numerous web servers that also running applications on the backend of the website they run, some of the websites tie back to databases and some do not. Many of the most important websites tie back to the same database on one database server. All of the devices are on one network segment and talk to each other back and forth without restriction. So when some but not all of the web servers start crashing over and over again who do you turn to? Well first IT takes a look to see if the traffic has abnormally spiked, it hasn’t. Look to see if there has been a website or application release at that time, there hasn’t been. Is there a network segment down, no because only a few out of the many websites are crashing. Are the servers actually crashing and rebooting, not only http is going down. Are we fully patched, yes and this possibility is low because it is affecting more than one website but less than the majority of websites. Does doing an IIS reset across the problem server farms stop the problem, only for a few minutes then the problem starts up again. Check the event logs, nothing out of the ordinary. IT goes through these and various other check with no indications of problems.

What do you do next? Take a step back and think what do these server farms have in common that the other unaffected server farms do not have. In the case all of the effected server farms pointed back to one database on one server. IT notifies Database Admins about the problem, response is everything is fine except connections to database are high. Database is responding well, CPU and memory on all boxes involved are normal. Network does not show any problems between web servers and database server, other server farms that talk to same database server but different database are fine. Next step get Developers in the loop and have them look at their applications, answer nothing that would effect this has changed, must be either IT’s servers/network or database. IT network team runs packet sniffers with little to no bad traffic flowing between front-end and backend. Connections between web servers and database continue to be very high but traffic continues to be average.

We have run this issue through all three teams, DBAs, IT (Sysadmins/NetworkAdmins), Developers with no clear answers. Problems appear and disappear when they feel like it. What would you do?


602 views

1 Trackback or Pingback for this entry

Leave a Reply

Security Code:

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Your Ad Here