客户有单机是10.2.0.1.0,突然说无法连接了,赶紧登录查看,lsnrctl没反应,hang住了,实例登录正常,因上班使用高峰,紧急处理方式:先关闭实例后reboot主机,重启后恢复正常。
[oracle@hydb ~]$ lsnrctl status
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 22-SEP-2023 09:15:45
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.200.100.30)(PORT=1521)))
[oracle@hydb admin]$ lsnrctl stop
LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 22-SEP-2023 09:16:51
Copyright (c) 1991, 2005, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=172.200.100.30)(PORT=1521)))
恢复正常后开始排查和查文档。
mos文档
LSNRCTL commands hang but listener process itself is running (文档 ID 979123.1)
10g Listener: High CPU Utilization - Listener May Hang (Doc ID 284602.1)
10gR2: TNS Listener Crash with Core Dump (Doc ID 549932.1)
10g: Intermittent TNS Listener Hang, New Child Listener Process Forked (Doc ID 340091.1)
Listener Hangs or Crashes or TNS-12518 & TNS-12540 Error When INBOUND_CONNECT_TIMEOUT_LISTENER = 0 (Doc ID 2830190.1)
LSNRCTL commands hang but listener process itself is running (Doc ID 979123.1)
Listener Hangs - TNS-01181: Internal registration connection limit reached (Doc ID 549649.1)
TNS-12518, TNS-12540, TNS-12582 and TNS-12615 Errors Reported in 11g Listener Log Under Heavy Load (Doc ID 1399677.1)
处理方法:
1、在oracle用户下添加参数
[oracle@hydb ~]$ echo "SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF" >> $ORACLE_HOME/network/admin/listener.ora
需要重启监听才可以生效,等下次维护时间重启
2、下次故障时,首先查看监听情况,使用命令如下
[oracle@hydb admin]$ ps -ef |grep LISTENER
20230922日更新
再次遇到同样问题,无法lsnrctl操作,只可以kill -9 后手动启动监听
使用oracle用户 继续优化操作后,继续优化
$cd $ORACLE_HOME/opmn/conf
$mv ons.config ons.config.orig
$ lsnrctl stop ; lsnrctl start
20230924日更新--机器中毒了,最终解决办法:配置iptables防火墙
[root@hydb ~]# ps -ef |grep pmon --未启动实例
root 4813 4138 0 14:26 pts/1 00:00:00 grep pmon
[root@hydb ~]# ps -ef |grep LISTENER --监听正常
oracle 4391 1 0 13:37 ? 00:00:02 /u01/app/oracle/product/10.2/db_1/bin/tnslsnr LISTENER -inherit
root 4765 4138 0 14:24 pts/1 00:00:00 grep LISTENER
[root@hydb ~]# lsof -Pani -p 4391|wc -l --监听连接统计
1022
查找实际的连接,发现该机器中毒,tnslsnr作为客户端连接其他主机
[root@hydb ~]# lsof -Pani -p 4391
tnslsnr 4391 oracle 985u IPv4 18143 0t0 TCP 172.200.100.30:1521->172.16.119.10:47594 (ESTABLISHED)
tnslsnr 4391 oracle 986u IPv4 18144 0t0 TCP 172.200.100.30:1521->172.200.32.183:33048 (ESTABLISHED)
tnslsnr 4391 oracle 987u IPv4 18157 0t0 TCP 172.200.100.30:1521->172.200.32.183:33050 (ESTABLISHED)
tnslsnr 4391 oracle 988u IPv4 18158 0t0 TCP 172.200.100.30:1521->172.200.32.183:33052 (ESTABLISHED)
tnslsnr 4391 oracle 989u IPv4 18160 0t0 TCP 172.200.100.30:1521->172.200.32.183:33054 (ESTABLISHED)
tnslsnr 4391 oracle 990u IPv4 18161 0t0 TCP 172.200.100.30:1521->172.200.32.183:33056 (ESTABLISHED)
tnslsnr 4391 oracle 991u IPv4 18162 0t0 TCP 172.200.100.30:1521->172.200.32.183:33058 (ESTABLISHED)
tnslsnr 4391 oracle 992u IPv4 18163 0t0 TCP 172.200.100.30:1521->172.200.32.183:33060 (ESTABLISHED)
tnslsnr 4391 oracle 993u IPv4 18164 0t0 TCP 172.200.100.30:1521->172.16.119.10:47596 (ESTABLISHED)
tnslsnr 4391 oracle 994u IPv4 18165 0t0 TCP 172.200.100.30:1521->172.200.32.183:33062 (ESTABLISHED)
tnslsnr 4391 oracle 995u IPv4 18166 0t0 TCP 172.200.100.30:1521->172.200.32.183:33064 (ESTABLISHED)
tnslsnr 4391 oracle 996u IPv4 18167 0t0 TCP 172.200.100.30:1521->172.200.32.183:33066 (ESTABLISHED)
tnslsnr 4391 oracle 997u IPv4 18168 0t0 TCP 172.200.100.30:1521->172.200.32.183:33068 (ESTABLISHED)
tnslsnr 4391 oracle 998u IPv4 18169 0t0 TCP 172.200.100.30:1521->172.200.32.183:33070 (ESTABLISHED)
tnslsnr 4391 oracle 999u IPv4 18482 0t0 TCP 172.200.100.30:1521->172.16.119.10:47598 (ESTABLISHED)
tnslsnr 4391 oracle 1000u IPv4 18503 0t0 TCP 172.200.100.30:1521->172.200.32.183:33072 (ESTABLISHED)
tnslsnr 4391 oracle 1001u IPv4 18504 0t0 TCP 172.200.100.30:1521->172.200.32.183:33074 (ESTABLISHED)
tnslsnr 4391 oracle 1002u IPv4 18505 0t0 TCP 172.200.100.30:1521->172.200.32.183:33078 (ESTABLISHED)
监听trace文件中显示,到1023就监听就hang住, lsof -Pani -p 4391|wc -l 最大连接是1022
[24-SEP-2023 14:22:43:560] nsevmute: entry
[24-SEP-2023 14:22:43:560] nsevmute: cid=3
[24-SEP-2023 14:22:43:560] nsevmute: normal exit
[24-SEP-2023 14:22:43:560] nsevwait: 0 posted event(s)
[24-SEP-2023 14:22:43:560] nsevwait: exit (0)
[24-SEP-2023 14:22:43:560] nsevwait: entry
[24-SEP-2023 14:22:43:560] nsevwait: 1022 registered connection(s)
[24-SEP-2023 14:22:43:560] nsevwait: 0 pre-posted event(s)
[24-SEP-2023 14:22:43:560] nsevwait: waiting for transport event (1 thru 1023)...
##关闭跟踪 LSNRCTL> set trc_level 0
##开启跟踪 LSNRCTL> set trc_level 16
Off或者数值0:表示对当前的监听器不开启跟踪;
Support或者数值16:故障分析级别
#查看文件名称 LSNRCTL> show trc_file
#查看文件目录 LSNRCTL> show trc_directory
#查看跟踪程度 LSNRCTL> show trc_level
配置iptable自启动
# chkconfig iptables on && chkconfig --list|grep iptables
配置iptable脚本并执行
# vi /opt/iptables.sh
service iptables start
iptables -F
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
iptables -A INPUT -s 127.0.0.1/32 -d 127.0.0.1/32 -j ACCEPT
iptables -A INPUT -s 172.200.100.60/32 -p tcp -m tcp --dport 1521 -j ACCEPT
iptables -A INPUT -s 172.200.100.94/32 -p tcp -m tcp --dport 1521 -j ACCEPT
iptables -A INPUT -s 192.168.100.57/32 -p tcp -m tcp --dport 1521 -j ACCEPT
iptables -A INPUT -s 172.200.100.42/32 -p tcp -m tcp --dport 21 -j ACCEPT
iptables -A INPUT -s 172.200.100.42/32 -p tcp -m tcp --dport 22 -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -m state --state INVALID -j DROP
iptables -A INPUT -p icmp -j ACCEPT
iptables -A OUTPUT -p icmp -j ACCEPT
iptables -A FORWARD -m state --state INVALID -j DROP
iptables -A OUTPUT -m state --state INVALID -j DROP
iptables -A INPUT -p tcp --dport 22 -j DROP
iptables -A INPUT -j REJECT --reject-with icmp-port-unreachable
iptables -A FORWARD -j REJECT --reject-with icmp-port-unreachable
service iptables save
service iptables stop && service iptables start
/bin/sleep 600
service iptables stop
# nohup sh /opt/iptables.sh &