ES数据库索引报错
ES数据库索引报错
ERROR - {'took': 0, 'errors': True, 'items': [{'create': {'_index': 'xxxx-log-2.0-2022.01.11-000001', '_type': '_doc', '_id': 'wdrDR5YBNUJot4R74noE', 'status': 429, 'error': {'type': 'cluster_block_exception', 'reason': 'index [xxxx-log-2.0-2022.01.11-000001] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}}}]}
关键点:[TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block]
由于 Elasticsearch 磁盘空间不足导致的写入阻塞错误
解决方式
错误原因分析
{"error": {"type": "cluster_block_exception","reason": "index [...] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark..."}
}
• 触发机制:Elasticsearch 默认磁盘水位线保护策略
• flood-stage
(红色警戒线):磁盘使用 ≥ 95%
• 超过该阈值后,ES 自动将索引设为 只读模式(仅允许删除)
紧急处理步骤
1. 确认当前磁盘状态
# 查看所有节点磁盘使用率
GET _cat/allocation?v&h=node,disk.percent,disk.avail,disk.total,disk.indices# 检查水位线配置
GET _cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk*
2. 临时恢复写入(应急)
# 关闭磁盘检查(需谨慎,可能导致磁盘写满)
PUT _cluster/settings
{"persistent": {"cluster.routing.allocation.disk.threshold_enabled": "false"}
}# 解除索引只读状态(替换your_index_name)
PUT your_index_name/_settings
{"index.blocks.read_only_allow_delete": null
}
3. 长期解决方案
操作方向 | 具体措施 |
---|---|
清理数据 | - 删除旧索引:DELETE /old_index_* - 使用 ILM 自动管理生命周期 |
扩容存储 | - 增加磁盘容量 - 添加新数据节点 |
优化存储 | - 启用压缩:"index.codec": "best_compression" - 关闭未用字段索引 |
调整水位线 | 修改配置(需重启):yaml<br>cluster.routing.allocation.disk.watermark.flood_stage: 97%<br> |
预防性配置建议
# elasticsearch.yml
# -------------------------------
# 磁盘水位线阈值(按需调整)
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%# 自动删除旧数据(示例保留7天)
PUT _ilm/policy/log_retention_policy
{"policy": {"phases": {"hot": {"min_age": "0ms","actions": {"rollover": {"max_size": "50gb", "max_age": "7d"}}},"delete": {"min_age": "7d","actions": {"delete": {}}}}}
}
监控方案示例
# 通过Prometheus监控(示例告警规则)
- alert: ElasticsearchDiskFullexpr: elasticsearch_cluster_filesystem_used_percent > 90for: 5mlabels:severity: criticalannotations:summary: "ES节点磁盘即将写满 ({{ $value }}% used)"description: "节点 {{ $labels.node }} 磁盘使用率超过90%"
注意事项
- 强制关闭磁盘检查后,需在 4小时内 完成数据清理或扩容
best_compression
压缩算法会增加约 10% CPU 负载- 修改水位线阈值时,建议保持至少 5% 的缓冲空间
通过以上操作可快速恢复服务,但需从根本上解决存储容量问题以避免重复触发保护机制。