Job duties for site reliability engineers vary by company, but most are responsible for the availability, latency (i.e., the total time it takes a data packet to travel from one node to another), performance, and capacity (i.e., the maximum possible output that can be produced by a product) of digital products and systems. These engineers are concerned with both keeping apps and computer systems (software, hardware, network) running effectively and responding to any event (e.g., bandwidth outage, hardware degradation, high usage, configuration errors) that affects the ability of customers to use the product.
At many companies, SREs spend about 50 percent of their time on call to resolve issues with technology. These issues might involve an issue that can be fixed in a few minutes, or problems that can take hours or even days to resolve. During the incident, the engineer refers to a runbook—which contains a summary of past technical issues and instructions on how they were fixed—to work through a series of steps to fix the problem. They also collaborate with other engineers and managers to solve the problem. When a runbook is unavailable, SREs must use their analytical and problem-solving abilities to assess the issue, determine potential causes, and devise solutions. As they work to resolve the problem, they record their actions and hypotheses so that a runbook can be created for reference if the problem recurs. Once the problem is resolved, the engineer prepares an incident response report that details what happened, what steps he or she took during the incident to find the root cause, and what was done to solve the problem. For major incidents, members of a site reliability engineering team participate in what is known as a blameless postmortem meeting. This focuses on the facts, rather than singling out employees who may have made a mistake that caused or enhanced the problem. During the meeting, SREs discuss the information presented in the incident response report to determine how the incident can be prevented in the future. The SRE might be asked to prepare additional documentation regarding the incident, and the group may conduct further investigations to gather more information or to test hypotheses of the root cause(s) of the issue.
During the other 50 percent of their workdays, SREs monitor the product or system in real time in order to track trends in performance that may indicate reduced reliability. When they identify an area of concern, they conduct tests, write replacement code (if necessary), and otherwise work to make their company’s products as reliable as possible. When possible, engineers write code that automates time-consuming tasks that have reduced reliability or that have even caused products or systems to malfunction. Other duties include preparing service overviews of new products that summarize their system architecture, components and dependencies, and other parameters; conducting production readiness reviews to ensure that a new product meets expected standards for performance and reliability; projecting future demand for a company’s products in order to ensure that there is enough bandwidth and other computing resources available to satisfy expected customer demand; developing plans to upgrade the behavior or performance of a service, while preserving service reliability; and developing plans to decommission a dated product or system in a way that does not affect the performance of related products or systems.
- 3-D Printing Specialists
- Agile Coaches or Trainers
- Artificial Intelligence Specialists
- Augmented Reality Developers
- Automation Engineers
- Autonomous Vehicle Safety and Test Drivers
- Back-End Developers
- Big Data Developers
- Biometrics Systems Specialists
- Blockchain Developers
- Chief Information Officers
- Chief Information Security Officers
- Chief Robotics Officer
- Clinical Data Managers
- Cloud Engineers
- Computer and Office Machine Service Technicians
- Computer and Video Game Designers
- Computer Network Administrators
- Computer Programmers
- Computer Support Service Owners
- Computer Support Specialists
- Computer Systems Programmer/Analysts
- Computer Trainers
- Computer-Aided Design Drafters and Technicians
- Cryptocurrency Specialists
- Customer Success Managers
- Cybersecurity Architects
- Data Entry Clerks
- Data Processing Technicians
- Data Scientists
- Data Warehousing Specialists
- Database Specialists
- Deepfake Professionals
- Digital Agents
- Digital Workplace Experience Engineers
- Document Management Specialists
- Driverless Car Engineers
- Electrical Engineering Technologists
- Electrical Engineers
- Electronics Engineering Technicians
- Electronics Engineers
- Electronics Service Technicians
- Embedded Systems Engineers
- Enterprise Architects
- Ergonomists
- ETL Developers
- Fiber Optics Technicians
- Full Stack Developers/Engineers
- Futurists
- Geospatial Analytics Specialists
- Graphic Designers
- Graphics Programmers
- Hardware Engineers
- Health Informaticists
- Help Desk Representatives
- Industrial Designers
- Information Assurance Analysts
- Information Security Analysts
- Information Technology Consultants
- Information Technology Infrastructure Engineers
- Information Technology Project Managers
- Information Technology Security Consultants
- Internet Consultants
- Internet Developers
- Internet of Things Developers
- Internet Security Specialists
- JavaScript Developers
- Machine Learning Engineers
- Mathematicians
- Microelectronics Technicians
- Mobile Software Developers
- Model View Controller Developers
- Network Operations Center Engineers
- Network Operations Center Technicians
- Online Gambling Specialists
- Personal Privacy Advisors
- Product Development Directors
- Product Management Directors
- Product Managers
- Product Owners
- Project Managers
- Radio Frequency Identification Device Specialists
- Salesforce Developers
- Scrum Masters
- Semiconductor Technicians
- Smart Building Systems Designers
- Software Application Developers
- Software Designers
- Software Engineers
- Software Quality Assurance Testers
- Solutions Architects
- Systems Setup Specialists
- Technical Support Specialists
- Technical Writers and Editors
- Technology Ethicists
- Unity Developers
- User Experience Designers
- Visual Interaction Designers
- Wireless Service Technicians