The Senior Site Reliability Engineer (SRE) will be a key member of SharpSpring’s Site Reliability Engineering team. Our Site Reliability Engineering team is responsible for ensuring the the reliability and durability of the company’s software and services.
We are looking for a Senior SRE that is a skilled sysadmin but also possesses strong programming abilities. The Site Reliability Engineering team uses software engineering skills and principles to solve problems that would traditionally be handled as manual, operational tasks. As a SharpSpring Senior SRE, you’ll be responsible for influencing architectural decisions and driving architectural changes within the company. You’ll also be responsible for helping with capacity planning: evaluating resource utilization with an eye towards scale and budget.
The ideal candidate will be self-motivated, possess excellent communication skills (both oral and written) and be able to work both independently and collaboratively on the Site Reliability Engineering team. A keen interest in various aspects of Linux system administration and software development is essential in our multi-disciplinary team. If you are interested in joining a growing, dynamic, and successful Tech company where your work will make a significant impact on the growth and success of the company, then we want to talk to you!
- 5+ years professional work experience.
- 3+ years of proven systems administration and operations success.
- 2+ years in a senior system administration or operations role.
- Experience operating distributed systems at scale.
- Linux system administration and knowledge of Linux system internals. In particular, you should understand the Linux process model and be comfortable using common debugging tools like strace, gdb, lsof, etc.
- Docker experience, including knowledge of Docker internals and Linux cgroup internals.
- Kubernetes administration experience, including knowledge of Kubernetes internals.
- Knowledge of networking fundamentals. You should understand iptables rules and be comfortable using common network troubleshooting tools like netstat, netcat, nmap, etc.
- Strong Bash scripting skills.
- Additionally, you should have strong working knowledge of these programming languages:
- You must be comfortable troubleshooting and debugging web applications across the entire stack (i.e. the application layer, the database layer, the OS).
- Production MongoDB experience, specifically experience using MongoDB replication and sharding.
- Production MySQL experience: replication, performance tuning, query optimization.
- You should have familiarity with Ansible or other configuration management tools like Puppet or Chef.
- The ideal candidate has experience using Prometheus, Alert Manager, and Grafana.
- You think of infrastructure and automation as code
- You handle large services and applications in high traffic environments
- You enjoy working at scale
- You understand server & network failures and how to handle them
- You like coding challenges and thrive on efficient and fast code
- You are passionate about what you do and often explore new tools and technologies that make automation and scale a reality