|
楼主 |
发表于 2007-10-5 04:13:15
|
显示全部楼层
FWSM故障排除过程以及解决方案
现场客户描述记录:
问题说明:
31号晚上近11点时候网络忽然出了问题,现象就是无法用WWW访问外网,但内网可以访问(可以访问校园内部www服务器)。
我们首先作的诊断如下:
《1》、园区网内网关X.X.80.254以及202.112.X.X(防火墙外口)都可以PING通。
《2》、防火墙上还有流量,大约有几百兆样子(通过SCE2020观测流量得知)。
《3》、域名服务无法解释出外网地址,但校内域名解释正常-------DNS服务器在防火墙里面。
对于这个现象,我们第一反映是:学校域名服务不象出问题了,那会不会是教育网域名出的问题呢?
于是,通过SSH登陆到清华的一台机器上,发现他们的域名很正常,说明不是教育网的问题。而因为防火墙一些不是域名的服务好象没出问题(最起码它有流量,并且CERNET也能够PING通外口地址),因此根本没想到是防火墙的问题,于是以为是域名问题。
域名服务器上日志显示是 no more recursive clients: quota reached. 开始是以为域名受攻击了,然后把recursive值变大,把防火墙设上甚至切换服务仍旧解决不了问题,这时候就有些迷惑。查了一下日志,找出了域名请求比较频繁的地址,想在防火墙上把它过滤掉。
这时候问题出来了,我能够很顺利的登陆到思科6509路由器上(SUP720),但是使用session slot 7 processor 1 登陆到防火墙模块时候就等待半天也进不去。开始是以为我的机器的问题,于是又通过控制端口用笔记本登陆,现象仍旧是一样的,就是防火墙模块进不去。这时候,终于知道问题出在防火墙上边了!
得出的结论就是:
1、问题出在防火墙上。
2、此时的现象是:防火墙并没有完全死掉,一些服务还能出去,但校内域名服务出现了问题。同时从sup720引擎使用session slot 7 processor 1命令已经无法登陆防火墙模块(防火墙模块插在第七插槽),说明问题出在防火墙模块上!!!
之后向cisco开个case
Cisco给的解释:
Hi,dear ge:
I suggested you to upgrade the fwsm IOS to 2.3.4.(8), and make sure remove following commands from your fwsm configuration:
root cause:Bug CSCse15099
workaround:
1.clear xlate -------> The FWSM can stop forwarding traffic and start printing a message about failure to allocate a translation
solution: software upgrade ****note, disable h323 fixup to prevent from FWSM crash**************** (Remove h323 inspection with by removing the commands:
inspect h323 h225
inspect h323 ras )
Best Regards!!
Tel: (8610) 85155566
email:juyao@cisco.com
在cisco网站上查找bug--- Bug CSCse15099,得出的结论:
CSCse15099 Bug Details
Headline FWSM may crash at fast_fixup
Product c6k-fwm
Feature Address Translation Components
Duplicate of
Severity 3 Severity help
Status Verified Status help
First Found-in Version 3.1(1.4) First Fixed-in Version 3.1(2), 3.2(0.1), 2.3(4.1), 3.1(1.10) Version help
Release Notes
Symptom:
The problem can be experienced in different ways, even though the root cause is the same:
1) An FWSM may crash at Thread Name: fast_fixup.
2) The FWSM can stop forwarding traffic and start printing a message about failure to allocate a translation
3) the FWSM can hung and most of the show commands might not return the output
The root cause for this is a corruption in the translation table that is preventing the FWSM from processing new connection and from responding to query commands like 'sh xlate'
Workaround:
The root cause has been determined in a corruption in the xlate table that might occur when using extensively PAT translations with or without voice protocols involved.
Further Problem Description:
We have observed this behavior in an FWSM running mulit-context routed mode. A reload may be needed to recover. |
|