Payroll Technology at Amazon is all about enabling our business to perform at scale as efficiently as possible with no defects. As Amazon's workforce grows, both in size and geography, Amazon's payroll operations become increasingly complex, and our customers are asked to do more with less. Process can only get them so far, and that's where we come in with technology solutions to integrate and automate systems, detect defects before payment, and provide insights.
The Amazon Payroll Technology team is looking for a Systems Engineer to drive growth and stability in our Payroll platforms which run on Windows, Linux, and AWS Cloud Infrastructure. You will support critical business functions for customers across the world while meeting high up-time SLAs and ensuring robust system performance. You will discover and innovative ways to automate and scale our infrastructure as we expand our applications globally.
You’re perfect if you possess that rare mix of depth of Scripting, Systems Engineering, and Customer Obsession. You’re right for the job if you're comfortable with deep technical knowledge of O/S, networking, and distributed architectures. You'll excel if you have enthusiasm for digging deep and a flare for sharp technical communication, prioritization and organization. In addition to providing management and support of Payroll Technology's vast infrastructure, you are expected to develop best practices, refine operational procedure, and constantly think pro-actively.
You would need following skills to be successful in this role.
- Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) at least thousands of servers.
- Experience and knowledge of major AWS services like, EC2, SQS, SNS, S3, EBS, EFS, Lambda, EMR, KMS etc.
- Experience in debugging latency in applications. In-depth understanding on at-least one DBMS.
- Experience in at-least one No SQL database.
- Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments.
- A solid grasp of networking fundamentals, including hands-on experience with load balancers, switches, routers, etc.
- Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems.
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
- Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency.
- Provide hardware, manageability, operability and performance perspectives on distributed platforms.
- Define and/or refine hardware requirements and select designs, balancing raw up-front dollar cost with operability, from the data center infrastructure up, specify and participate in the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation.
- Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic.
- Participate in the design and execution of production acceptance tests.
- Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components.
- Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed.
- Perform various system maintenance tasks, including configuration of new machines.