Just thought I’d share what I go through each day, each week, each month. Most weeks are usually consistent. Filled with morning stand up’s to discuss the challenges, blockers and issues for the day, company operations meeting to discuss more challenges and issues that occurred in the last 24 hours, attending Change Advisory Board (CAB) meetings, level 2+3 support meetings, attend to major incidents etc etc etc. Each of these meetings have a purpose. How you approach these meetings key in building knowledge and future anticipation of issues. Below is a run down on how my week goes.
(Unless stated otherwise any meetings mentioned are in a virtual environment via MS teams or Slack)
Monday – Friday
- Daily stand up at 9.00am – 9.30am
- Discuss work challenges, blockers and issues
- Discuss and collaborate with colleagues on how to address these challenges, blockers and issues
- Discuss any major events happening in the next 24 hours
- Major changes or outages
- Major work that will take place
- E.g. deployments, assist with post verification testing on other team’s changes
- Company wide meetings
- New starters etc
- Usually on a Friday, discuss weekend oncall roster
- Daily morning operation team meeting 9.45am – 10.45am
- Attend this meeting along with other operational teams in the company
- I.e. The people that keeps the lights running 🙂
- Head of Major Incident management (MIM’s) to go through the list of incidents in the last 24 hours that impacted customers and systems
- Discuss the 5 W’S and How, these issues occurred and discuss preventative measures and solutions, so they don’t occur again
- But they usually reoccur again 🙂
- Discuss the 5 W’S and How, these issues occurred and discuss preventative measures and solutions, so they don’t occur again
- MIM’s list down any jobs or tasks that need to be allocated to work on the preventative measures and solutions for the incidents
- Attend this meeting along with other operational teams in the company
Monday
- Change Approval Board meeting, 2pm – 3pm
- Company wide meeting with MIM’s, snr stakeholders (e.g. managers, product owners etc)
- Discuss changes taking place that may be business critical
- These will be changes happening from Monday evening to Thurs morning (we also have a CAB meeting on Thurs)
Tuesday
- Weekly L2+L3 production catchup meeting, 10am – 11am
- Discuss with the developers and other team stakeholders on issues that occurred for the week (i.e. past 7 days)
- Discuss technical items on why these issues occurred
- E.g. Developers needing to code any fixes
- DevOps to fix any incorrect setup of services in the AWS infrastructure
- Discuss other teams responsibilities that was missed
- Discuss tasks that need to be allocated, followed up and actioned
- Discuss technical items on why these issues occurred
- Discuss oncall roster for the week for L2+L3 teams
- As a L2 Ops engineer I need a L3 team to lean on for coding analysis and input
- If an issue occurs during the week and I need my L3 team, I will arrange a bridge call with them to discuss the issue
- Discuss with the developers and other team stakeholders on issues that occurred for the week (i.e. past 7 days)
Thurs
- CAB, 2pm – 3pm
- Refer to CAB explanation under the Monday heading
- We usually have CAB twice a week
- Monday CAB covers changes from Mon to Wed/Thurs morning
- Thurs CAB covers changes from Thurs – Sun/Monday morning
Other:
- Monitoring and alerting meeting, 30-45 mins
- This meeting isn’t frequent but time to time we’ll discuss new apps and services being released into our stack
- These new apps and services need some monitoring and alerting in case there are issues
- Discuss with DevOps teams the tools being used to monitor these new apps and services
- E.g. Splunk queries, New Relic dashboards, CloudWatch alerts etc
- Discuss any blockers in terms of accessing these new apps and services, as well as how to access the monitoring tools
- This meeting isn’t frequent but time to time we’ll discuss new apps and services being released into our stack
- Product Lead team meeting, 30-45 mins
- When we release a new feature, app, service we usually need the product leads and developers to demo these new things
- I apply the 5W’s and How the new features, app or service being released
- How do I access this new app. E.g. What are the login steps and procedure to view the new app and it’s features
- Where is this app situated. E.g. in AWS Cloud ECS Cluster
- Who supports this app. E.g. Ops teams as L1 role, Dev support as L2 role
- When is this app available to be used. E.g. Can this app be used any time of the day or does it need to be taken offline for maintenance, upgrading and servicing?
- What does this app do? E.g. Does it transfer files between systems, is it a new chat system, is it a collaborative system etc
- Why does this app need to exist, what does it serve? E.g. We have developed this new app to fulfill a gap in our market. No one in our competition space has done this app.
- Daily/regular ‘Business As Usual’ (BAU) tasks
- Restart ECS AWS gateways for my apps each morning
- Perform deployments and releases for fixes and improvements for my apps
- PVT apps after major updates on host and networking systems
- Clear databases for stale and old data
- Working with DevOps teams to automate clearing of stale and old data from databases
- Performing morning checks of systems, which are documented on our morning run sheet
- Investigate ad-hoc / one off issues that are documented via our ticketing system
- Attempt to replicate the issue and provide a fix/solution
Summary:
- What I’ve mentioned above are a few meetings that occur during the week
- Most meetings covered in this post are usually less than hour
- During these meetings, I’m actively listening to any points that might be interesting. I may approach the person or team that had that interesting point to further evaluate that point.
- E.g. There was a major issue with our AWS networking setup and the Networking Operations team explained the networking architecture of our AWS stack. I took interest in the target group and load balancers section of the discussion. I took some notes and approach the networking team after about that particular section of the stack, to which they assisted and helped me with my understanding.
- During the week I may have other ad-hoc meetings