prometheus
下载prometheus-2.53.2
prometheus.yml文件修改
global:scrape_interval: 15sevaluation_interval: 15salerting:alertmanagers:- static_configs:- targets:- 127.0.0.1:9093rule_files:- "rules/rule-*.yml"scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090", "127.0.0.1:9104"]
其中127.0.0.1:9104
是mysqld_exporter的metric地址
新建rules目录,并创建规则 rule-first.yml
groups:- name: InstanceDown_Rulerules:- alert: InstanceDown # 告警名称expr: up == 0 # 告警条件for: 30s # 告警触发前需要持续满足条件的时间labels:severity: critical # 告警的严重程度annotations:summary: "Instance {{ $labels.instance }} down"description: "Instance {{ $labels.instance }} has been down for more than 5 minutes."
启动Prometheus
prometheus --config.file=prometheus.yml --storage.tsdb.path=./data --web.enable-lifecycle
访问Prometheus
http://127.0.0.1:9090/
alertmanager
下载alertmanager-0.27.0
修改配置文件alertmanager.yml
route:group_by: ['instance']group_wait: 10sgroup_interval: 20s#repeat_interval: 1hreceiver: 'web.hook'
receivers:- name: 'web.hook'webhook_configs:- url: 'http://127.0.0.1:5001/alert/hook'send_resolved: true
#inhibit_rules:
# - source_match:
# severity: 'critical'
# target_match:
# severity: 'warning'
# equal: ['instance']
其中 http://127.0.0.1:5001/alert/hook
是接收告警的钩子接口
启动alertmanager
alertmanager --config.file=alertmanager.yml
访问alertmanager
http://127.0.0.1:9093/
grafana
下载grafana-11.2.1
启动grafana
grafana-server
访问grafana
http://127.0.0.1:3000
mysqld_exporter
下载 mysqld_exporter-0.15.1
在mysqld_exporter根目录创建.my.cnf文件
[client]
user=root
password=root
user和password分别是MySQL数据库的用户和密码;mysqld_exporter需要安装与mysql_server同一个服务器上。
启动mysqld_exporter
mysqld_exporter
访问mysqld_exporter
http://127.0.0.1:9104/
编写hook
@PostMapping("/alert/hook")
public Response<String> alertHook(@RequestBody Map<String, Object> alertDataMap) {//TODO 在这里实现告警处理,发微信、邮件、钉钉都可以System.out.println(JSON.toJSONString(alertDataMap));return Response.ok("success");
}
以上使用Java编写接口,根据实际可以使用任何一种语言编写。
可以关闭mysqld_exporter,模拟服务宕机。/alert/hook
接口收到的数据如下:
{"receiver": "web\\.hook","status": "firing","alerts": [{"status": "firing","labels": {"alertname": "InstanceDown","instance": "127.0.0.1:9104","job": "prometheus","severity": "critical"},"annotations": {"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.","summary": "Instance 127.0.0.1:9104 down"},"startsAt": "2024-10-10T11:27:58.11Z","endsAt": "0001-01-01T00:00:00Z","generatorURL": "http://olive-my:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1","fingerprint": "106b3a6075af7628"}],"groupLabels": {"instance": "127.0.0.1:9104"},"commonLabels": {"alertname": "InstanceDown","instance": "127.0.0.1:9104","job": "prometheus","severity": "critical"},"commonAnnotations": {"description": "Instance 127.0.0.1:9104 has been down for more than 5 minutes.","summary": "Instance 127.0.0.1:9104 down"},"externalURL": "http://olive-my:9093","version": "4","groupKey": "{}:{instance=\"127.0.0.1:9104\"}","truncatedAlerts": 0
}