Become a part of Brainstack_ Infrastructure Team and embark on a new professional journey!
By joining our team, you will:
— Take part in a big Kubernetes migration process.
— Have a chance to write a few services in Python and optimize the load on servers with different DBMS (MySQL, PostgreSQL.)
— Create and describe various processes, from setting up servers to deploying helm charts in a k8s cluster and patching the Linux kernel for more optimal work.
— Work with the two most popular monitoring systems — Zabbix and Prometheus.
— Put to test the basic principles and approaches of DevOps/SRE methodologies.
Now we are looking for an experienced SRE/DevOps Engineer to join the team.
— Proven experience as a DevOps, Linux System Administrator or SRE with production environment linux based.
— Network knowledge, complete understanding of OSI model. Ability to troubleshoot, profile and resolve network issues.
— Knowledge of Kubernetes core components: kube-apiserver, kube-scheduler, kube-controller-manager, kube-proxy, etc.
— Experience with containering and virtualization using Docker, LXC, KVM/QEMU;
— Experience with databases: MySQL\Percona, PostgreSQL, MongoDB, including performance tuning, replication and high availability practices;
— Strong experience of administration linux distributions (Debian/Centos/Ubuntu/etc), hardening practices, performance tuning practices;
— Knowledge local tools for monitoring and troubleshooting: tcpdump, ss/netstat, mtr, vmstat, iostat, top, sar, free, pmap, ps, lsof, strace, iproute2 utilities;
— Experience with monitoring and logs collecting systems: Zabbix, Grafana, ELK stack, Prometheus;
— Ability to write complex scripts (shell/python);
— Experience with IT Automation Software with any tools like ansible, puppet or chef — Sufficient English to understand the documentation;
— Experience with incident management and on-call rotation;
— Maintain highly available Kubernetes cluster on bare metal
— Ensure high uptime (99.995%+) and response time for Kubenetes cluster and production environment
— Implement configuration management for new and existed services by using industry best practices and tools
— Provide release and application support for developers and QA
— Ensure accessibility, integration, performance and security for all tools used in the product life cycle
— Provide day-to-day operational support of mission-critical systems and services
— Root Cause Analyzing for all production outages
— Improve monitoring and alerting systems
— This position includes weekly on-calls (1-2 weeks per month). On non-work time you must react only on critical incidents that affect production environment
— Possibility to be part of the creative environment and make your input to it;
— Variety of social and professional activities;
— Friendly and warm culture;
— PE employment;
— 18 days of vacation paid per year;
— Medical insurance paid after successful trial period;
— English courses (partly compensated by the company);
— Documented sick days paid;
— 3 non-documented sick days paid;
— Breakfasts/fruit/yummies in the kitchen.
Feel free to contact me for more information
— email — [email protected]