Internal Support Agent

Get unblocked instantly.
Stay in flow forever.

TierZero Internal Support Agent learns from your docs and infrastructure to respond to queries. Fewer bottlenecks, fewer context-switching, more shipping.

Support Agent analytics dashboard

How it works

1ASK

Ask in Slack, get answers in seconds.

Engineers ask questions directly in Slack — "How do I roll back service X?" "What's the runbook for database failover?" No context-switching, no ticket filing, no waiting for someone who knows.

2GROUND TRUTH

Grounded in your docs, code, and live systems.

TierZero searches Notion, Confluence, runbooks, code repos, past incidents, and Slack history. But it doesn't stop at docs — it cross-references live telemetry, deployment state, and code to give answers that are actually current.

3MEASURE

See what's working. Tune what isn't.

Track deflection rate, CSAT, and question categories out of the box. See which topics need better documentation, which answers get thumbs-down, and fine-tune the agent's behavior without writing code.

Support Agents
eng-questions
Jan 2025
Requests
847
+12% vs last month
Deflection Rate
94%
798 / 847 deflected
CSAT
4.6
out of 5.0
Requests (weekly)
142
189
165
213
W1W2W3W4
By Category (weekly)
Deploy
DB
Auth
CI/CD
W1W2W3W4
Recent Questions
Tianyi HuangHey does anyone know how to roll back the payment service? We pushed a bad config change and transactions are failing in prod for EU customers...Jan 27, 10:32 AMRESOLVED
Jake Rachleff@db-eng we're seeing p99 latency spike to 800ms on the orders table, started about 20 min ago — anyone know if there's a long-running migration or...Jan 27, 10:18 AMLOW CONF
Yun ParkWhere are the staging DB credentials stored? I need to connect to the read replica to debug a data sync issue between checkout and inventory...Jan 26, 4:54 PMRESOLVED
Anhang Zhu@db-eng the connection pool on user-service keeps maxing out at 50 connections during peak hours, is there a runbook for bumping the limit or do we need...Jan 26, 2:41 PMESCALATED
INSIGHTS

See what your team is really asking.

Every question is a signal. TierZero tracks what engineers ask, which topics have low-confidence answers, and where your documentation has gaps — so you can invest in the right places.

Question analytics

See trending topics, question volume, and auto-resolution rates across your team.

Knowledge gap detection

Automatically flags topics where answers are weak or missing, so you know exactly which runbooks to write.

Deflection tracking

Measure how many questions are resolved without escalating to a human — and track improvement over time.

STANDARD OPERATING PROCEDURES

From answers to actions.

TierZero doesn't just tell engineers what to do — it can do it for them. Define SOPs for common operations and let engineers trigger them directly from Slack with guardrails built in.

Execute common ops

Restart pods, clear caches, scale deployments, rotate secrets — all from a Slack message.

Guardrails built in

SOPs run with scoped permissions, environment checks, and audit logging. No cowboy kubectl.

Self-service at scale

Junior engineers handle routine operations safely. Senior engineers stop being a human API.

Edit Support Agent

Update the configuration for this support agent.

Instructions
1### Investigate Data/DB workload and performance
2If user asks about slow database latency, investigate and provide reasons.
31. Use {{ rds identifier }} to review CPU, connections, IOPS, and freeable memory.
42. Use $performance_insights to find top queries by load and blocking sessions.
53. Check recent deploys and current traffic for the affected service.
64. Check Redis for anomalies (CPU, memory, evictions) and correlate with app latency/errors.
7Escalate to @db-eng for sustained high CPU, storage pressure, or blocking writes.
8 
9### Stabilize Kubernetes/infra issues
10If user asks about Kubernetes pod failures or service instability, investigate and stabilize.
111. Use kubectl to identify failing pods and their nodes.
122. Use kubectl to detect OOMKilled, CrashLoopBackOff, or scheduling errors.
133. For stateless workloads, delete the unhealthy pod to restart; confirm rollout with "rollout status deploy".
144. If pods are Pending due to insufficient resources, check cluster events; avoid changing requests/limits and coordinate capacity with platform.
155. If a node is NotReady, notify platform immediately and consider opening a PD incident if impact is broad.
16Escalate to @compute-eng for capacity/node problems and to app owners for recurring OOMs or errors.
17 
18### Track external provider/service incidents
19If user asks about issues tied to a third-party provider, confirm scope and communicate clearly.
201. Check $statuspage for {{ external service provider }}; note start time, regions, and affected features.
212. Check for impacted apps (error rate, latency, throughput) to confirm dependency impact.
223. Post an FYI in #noc with the status link and observed impact; consider temporary alert tuning only with owner approval.
234. If member-facing or lasting over 30 minutes, open a PD incident and start stakeholder updates every 15–30 minutes.
245. Close with a resolution update including the impact window and status link.
25Escalate to @platform-eng and owning app teams for critical dependencies (payments, auth, voice).
26 
27### Triage deployment failures
28If user reports a failed deployment or rollback, investigate root cause and restore the service.
291. Check $deploy_dashboard for the failed pipeline; note the failing step, commit SHA, and error output.
302. Pull recent commits from {{ service repo }} and review diffs for breaking changes.
313. If the failure is a test flake, re-trigger the pipeline. If it is a real failure, revert the last merge.
324. Confirm service health after rollback via $grafana and synthetic checks.
33Escalate to @release-eng for persistent pipeline failures or infra-level build issues.

Senior engineers got 3 hours a day back.

85%

Questions Answered Without Escalation

Engineers get answers without pinging the on-call

<30s

Median Response Time

Versus hours waiting for a human reply

Your docs are useless if nobody reads them.