It's 3 o'clock in the morning. Most of us reading this article can sleep peacefully and deservedly at this time of night, undisturbed by the processes behind the numerous services that we enjoy using extensively every day; from smartphones to gaming PCs, from cars - more than ever a driving "device", a "connected car" - to the infrastructure in the office. But what if one of these services has a fault? When something fails? Who takes care of it? Who has to get up at three in the morning when an urgent "High Priority Incident" ticket opens, triggers an alarm and needs to be resolved immediately?
So this is the story of, let's call him Andi for today. Andi does this because Andi is part of the 2nd level support team, which in turn supports the operations team. Let's take a look at what Andi is up to, and then? After that, we all sleep a little more soundly because we know who takes care of what, how and why - have fun!
- Operations teams: The first line of defense for stable projects
- 2nd level support - second level, first class
- 2nd Level Support - This is what a typical High Priority Incident Ticket looks like
- Operations and 2nd level - The reality check
- Social and organizational skills:
- Education and certificates:
- Operations and 2nd level support at Cognizant Mobility
- 2nd level support in operations teams: What do we learn?
Marc
Marketing Professional
19.07.24
Ca. 26 min
Operations teams: The first line of defense for stable projects
Operations teams, usually abbreviated to “Ops”, are an essential pillar of every IT company. But how exactly is an Ops team defined?
An Ops team is a group of IT professionals who are responsible for the day-to-day operation, maintenance and monitoring of IT systems and infrastructures. Their main task is to ensure that all IT services and applications run smoothly in order to guarantee continuous business operations. Here are some of the essential functions and responsibilities of an operations team:
- Monitoring and managing systems and networks: The Ops team continuously monitors the IT infrastructure, including servers, networks, databases and applications, to ensure that they are functioning properly. They use special monitoring tools that help them to identify and rectify problems at an early stage.
- Incident management: When problems or malfunctions (incidents) occur, the Ops team is responsible for quickly identifying, analyzing and eliminating the causes. This includes both solving short-term problems and implementing long-term measures to prevent future incidents.
- Change management: The Ops team manages changes to the IT infrastructure to ensure that they are implemented without interrupting or disrupting operations. This includes the planning, approval and implementation of changes and the subsequent review of their effectiveness.
- Maintenance and updates: Regular maintenance work and updates are necessary to keep the IT systems secure and efficient. The Ops team performs these tasks, often during scheduled maintenance windows, to minimize the impact on users.
- Security management: Protecting the IT infrastructure against threats and attacks is a critical task for the Ops team. This includes implementing and monitoring security measures, managing firewalls and other security systems and responding to security incidents.
- Backup and Disaster Recovery: The Ops team ensures that data is backed up regularly and that effective disaster recovery plans are in place to get back up and running quickly in the event of a data loss or disaster.
- Documentation and reporting: All processes and changes must be carefully documented. This documentation not only helps to track and analyze incidents, but is also important for compliance with legal and regulatory requirements.
An operations team therefore plays a crucial role in a company’s IT department after the software in question has been developed and deployed, by ensuring that technological resources are used efficiently and reliably to support business objectives. Their work is often in the background and is only noticed when problems occur – but then their irreplaceable importance for the stable operation of the IT infrastructure becomes apparent.
A close connection with 2nd level support is therefore essential for the optimal functionality of IT services. So let’s take a look at what exactly a 2nd level support has to do – because then we’ll also understand better why Andi is awake at 3 a.m. so that we can, for example, call up a great function in our car the very next morning or even subscribe to a completely new one:
2nd level support – second level, first class
2nd level support in IT is a specialized level of support that provides technical assistance when problems or queries cannot be resolved by 1st level support. Here are the main features and tasks of 2nd level support:
- Deeper technical expertise: 2nd level support consists of technicians and experts with deeper technical knowledge and more specialized expertise than the 1st level support team. They are able to analyze and solve more complex and specific problems.
- Troubleshooting and error analysis: If 1st level support cannot solve a problem, it is escalated to 2nd level support. He carries out detailed analyses to identify the causes of the problems and find lasting solutions.
- Support with escalations: 2nd level support handles escalations that go beyond 1st level support. They are the next escalation level and have the authority and technical means to carry out more in-depth diagnoses and corrections.
- Collaboration with other teams: 2nd level support works closely with other IT departments, such as 3rd level support or development teams, to solve difficult technical problems. They often act as intermediaries between 1st level support and specialized teams.
- Documentation and knowledge management: 2nd level support documents solutions to recurring problems and maintains the knowledge database so that future inquiries can be processed more quickly and efficiently. This supports both 1st level support and other team members.
- User communication: Although 2nd level support has less direct contact with end users than 1st level support, it is still responsible for communicating with users when dealing with escalated cases. They provide information on progress and solutions to problems.
- Training and support for 1st level support: 2nd level support provides training and support for 1st level support to improve their skills and ensure that they can solve similar problems independently in the future.
2nd level support is therefore an extremely critical part of a company’s IT support system. He ensures that complex and advanced technical problems are solved effectively to ensure smooth operation of the IT infrastructure and a high level of end-user satisfaction. Above all, however, 2nd level support is a line of defense, a “fire station” – if all systems are running satisfactorily, developers and operations teams are equally happy. However, if something fails, after 1st level support, which primarily receives direct inquiries from stakeholders such as customers, it is 2nd level support that is responsible for restoring functionalities, especially in the case of “high priority incident tickets”. There. That term again. “High Priority Incident”, what does that mean? What is a ticket? What does it typically contain? So let’s take a look at a concrete example of what such a ticket might look like.
2nd Level Support – This is what a typical High Priority Incident Ticket looks like
So let’s have a look together at Andi’s ticket system, for which he has to be on standby even at night to solve this case immediately. During this simulated but very realistic problem definition and solution, we also learn why exactly some tickets have to be processed so urgently.
The ticket:
Ticket description:
Ticket ID: 12345
Priority: High (High Incident)
Created on: July 15, 2024, 02:00 am
Created by: 1st Level Support
Description: License verification system not accessible
Details:
Problem:
Users cannot log in to the license verification system. This means that no new licenses for critical software products can be issued or verified. Several locations worldwide are affected, which significantly disrupts the production process.
Effects:
The inability to verify software licenses or issue new ones leads to production downtime and delays in the delivery of products. This applies to both internal processes and relationships with suppliers and customers.
Reproducibility:
The problem occurs for all users and is consistently reproducible.
Error messages:
- “Connection to the license server failed”
- “License verification not possible”
Steps to reproduction:
- User tries to log in to the license system.
- Error message appears immediately after the login attempt.
Initial measures (1st level support):
- Checks whether the problem occurs for all users.
- Network connections and server availability checked.
- No obvious network problems detected.
- Escalation to 2nd level support due to high urgency.
Urgency of a High Priority Incident Ticket:
High Priority Incident Tickets are incidents that have a significant impact on business operations and require immediate attention. In a license verification system, such tickets can be urgent and critical for several reasons:
- Business continuity: Production lines could come to a standstill because software requiring a license cannot be used. This leads to direct financial losses and affects the entire supply chain.
- Contractual obligations: OEMs often have strict delivery deadlines and contractual obligations to their customers. Defaults can result in contractual penalties and jeopardize business relationships.
- Reputation: Frequent or prolonged system failures can affect customer confidence in the reliability of the OEM.
- Worldwide impact: As the problem affects global locations, the impact on operations and production planning can be enormous.
Typical 2nd level support activities for problem solving:
- System diagnostics: In-depth review of system logs and diagnostic tools to identify the cause of the problem.
- Server check: Checking the license servers and all connected services for availability and performance.
- Database check: Ensure that the database used for license verification is functioning correctly and has no corruption issues.
- Network check: Check the network connections between the affected locations and the license server to identify possible communication problems.
- Recovery measures: If necessary, restart servers and services or switch to backup systems to restore functionality.
- Communication: Regular updates to the 1st level support team and the affected users on the progress of the problem resolution.
Example of the solution to the problem:
After intensive analysis, the 2nd level support team discovers that a recent firewall configuration change is blocking access to the license server. They make the necessary adjustments and test the connections to ensure that the problem is completely resolved.
Result:
Ticket status: Solved
Solution: Firewall rules adjusted, license server restarted, connections tested and restored. All users can successfully log in again and the license verification works properly. Production operations were resumed as quickly as possible.
Andi breathes a sigh of relief – the ticket has been solved and the operations team can already see in the monitoring that all servers and license verification interfaces are now working properly. The OEM praises the assignment in the next meeting (here you realize that the story is fictitious – contracts are concluded for such assignments, in the framework of which the elimination of tickets is of course recorded and expected accordingly). However, there are other special features and differences at various IT companies such as Cognizant Mobility.
Operations and 2nd level – The reality check
Of course, Andi is a specialist, like all his colleagues in both the operations team and 2nd level support. They are often found here, for example:
- IT support specialists
- System administrators
- Network administrators
- Database administrators
- Technical support engineers
- Application Support Analysts
- IT Service Manager
- Technical Support Specialists
In turn, there are some core skills that are elementary for use in IT support:
- In-depth IT knowledge: Comprehensive knowledge of operating systems (Windows, Linux, macOS), network infrastructure, servers, databases and enterprise applications.
- Troubleshooting and problem solving: Ability to analyze complex IT problems, identify the root cause and implement effective solutions.
- Network knowledge: Familiarity with network protocols, router, switch and firewall configurations and VPN technologies.
- Database management: Knowledge of the administration and maintenance of databases (e.g. SQL, Oracle, MySQL), including backup and recovery procedures.
- Security Awareness: Understanding of IT security fundamentals, including firewall management, anti-malware tools and security policies.
- Scripting and programming skills: Basic knowledge of scripting languages (e.g. PowerShell, Bash) or programming languages (e.g. Python, Java) to automate tasks and create solutions to problems.
- Communication skills: Clear and concise communication with colleagues and customers to convey technical information clearly and handle support requests efficiently.
- Teamwork: Ability to collaborate with other IT teams such as 1st level support, developers and network administrators to solve problems quickly and effectively.
- Customer orientation: Focus on the needs of users and customers to ensure a high level of satisfaction and a positive user experience.
- Time management: Effectively managing your own time and prioritizing tasks to ensure that critical issues are resolved quickly.
- Documentation skills: Accurate and detailed documentation of issues, solutions and processes to support knowledge management and compliance requirements.
Education and certificates:
- Educational qualification: Usually a degree in computer science, information technology or a related field.
- Certifications: Industry certifications that underpin technical knowledge and skills, such as CompTIA A+, CompTIA Network+, Microsoft Certified: Azure Administrator, Cisco Certified Network Associate (CCNA), ITIL Foundation, and other specialized certifications depending on area of practice.
So we can see that support is wrongly looked down upon time and again – this task is demanding and enormously important. Hardly noticed as long as everything works smoothly, but of enormous importance from the slightest anomaly.
It should also be mentioned that skilled workers are highly sought after and courted these days. Remote work, including internationally, is therefore absolutely the order of the day and brings with it new challenges. For example, Cognizant Mobility works with teams in Romania and China: Even reconciling working hours in very different time zones is a challenge in itself. In addition, there are language barriers, tough team building due to very different backgrounds and working methods, different mindsets, religions and social environments.
And then there is the customer.
We wanted to let this sentence stand for itself, because: A high level of knowledge of the customer, of their often very extensive and complex accompanying systems and processes, is sometimes essential in order to be able to solve problems at all. If a new IP address is required for the server restart: In which range? Who assigns them and activates them? Who is responsible for this? The OEM acting as the customer? Or the product owner of the sub-project? Or does the customer even have external service providers on board with whom 2nd level support has to connect? Does it have a scheduled 3rd level support that can help?
Operations and 2nd level support at Cognizant Mobility
As we wanted to know more about what could be done better, we talked to Daniel Melendez, developer and project manager at Cognizant Mobility, among others.
And this resulted in something that we think is worth mentioning, because: Where many IT companies, especially in the supplier industry, rely on external providers for 2nd level support (small companies often struggle with the required shifts, to start with), Cognizant Mobility offers this internally, in close cooperation with the operations team responsible for the project, from whose members the support is also staffed.
This means that support processes can not only be optimized and continuously developed – it also means that prefabricated process structures exist in a development company that can be adapted and optimized to suit the customer’s specific needs, in direct coordination with the project and its employees and stakeholders. An exposed special position that is not often found in this form.
Above all, however, it is a single, stand-alone finding that explains the particular success of the operations teams and 2nd level support at Cognizant Mobility, and we would like to deliberately formulate this in a single sentence:
The backbone of every successful project, regardless of industry, complexity, scope or sub-sector: People.
People gain experience, which can of course be recorded in detailed documentation, even if, according to agile concepts, the development of functioning software is even more important than over-exact transcripts (which we still want!). People are adaptable and curious, search for new paths together with stakeholders and develop new, innovative and at the same time organically growing structures and processes. People overcome time zones, language barriers and cultural differences and, spread across the globe, grow together to form organic teams that put all their creative and productive power into projects. Everything from a single source, with a common mindset. And this is by no means just agile phrase-mongering: High-priority incidents are not uncommon and cause tense, stressful situations – often resulting in a high level of excitement, especially for the customer. Here, the operations team and support team are not just service providers, but often pastors and psychiatrists, financial advisors and problem solvers all rolled into one. A strong team, often with a friendly bond, is the key to successfully overcoming these challenges. Without this humanity and mutual trust, work in the Ops and support area would be unthinkable.
2nd level support in operations teams: What do we learn?
Well, the conclusion was already foreshadowed in the last paragraph: it’s about people, it’s about well-coordinated teams, processes developed together with the customer. It’s about Andi and his problem-solving skills (and by the way: Andi is a real person who the author of this article knows personally and can say with certainty: Andi loves to face these challenges again and again!) It’s about the interweaving of expertise, system knowledge and close coordination between developers, operations and support. It’s about togetherness.
Andi closes the laptop and smiles contentedly. It was exhausting, but it was also really cool to solve the problem. Andi has recorded the final solution in the company’s internal wiki and linked it in the ticket – if the problem occurs again, it can be solved within seconds and will only trigger a shrug of the shoulders from the colleague on support duty. And now Andi will play for another hour while he’s already awake and then he’ll go back to sleep.
He’s earned it, Andi.