Lead Data Engineer

Apply now »

Posted On: 13 Apr 2026

Location: Noida, UP, India

Company: Iris Software

Why Join Iris?
Are you ready to do the best work of your career at one of India’s Top 25 Best Workplaces in IT industry? Do you want to grow in an award-winning culture that truly values your talent and ambitions?
Join Iris Software — one of the fastest-growing IT services companies — where you own and shape your success story.
 
About Us  
At Iris Software, our vision is to be our client’s most trusted technology partner, and the first choice for the industry’s top professionals to realize their full potential.
With over 4,300 associates across India, U.S.A, and Canada, we help our enterprise clients thrive with technology-enabled transformation across financial services, healthcare, transportation & logistics, and professional services.
Our work covers complex, mission-critical applications with the latest technologies, such as high-value complex Application & Product Engineering, Data & Analytics, Cloud, DevOps, Data & MLOps, Quality Engineering, and Business Automation.

Working with Us
At Iris, every role is more than a job — it’s a launchpad for growth.
Our Employee Value Proposition, “Build Your Future. Own Your Journey.” reflects our belief that people thrive when they have ownership of their career and the right opportunities to shape it.
We foster a culture where your potential is valued, your voice matters, and your work creates real impact. With cutting-edge projects, personalized career development, continuous learning and mentorship, we support you to grow and become your best — both personally and professionally.
Curious what it’s like to work at Iris? Head to this video for an inside look at the people, the passion, and the possibilities. Watch it here.

Job Description

We are seeking a highly skilled Senior Data Engineer with 9+ years of 100% hands-on experience building and maintaining enterprise-grade data pipelines. This is a pure Individual Contributor role focused on writing production-quality code, developing scalable ETL/ELT solutions using PySpark and AWS, and orchestrating workflows with Airflow. If you thrive on solving complex technical problems and shipping robust, well-tested code, this role is for you.
Key Responsibilities
•    Develop and maintain robust, scalable ETL/ELT pipelines using PySpark on AWS EMR
•    Build data ingestion and transformation workflows from diverse sources (S3, EMR, RDS, Kafka, APIs) into AWS-based data lakes and warehouses
•    Write clean, modular, testable Python code following best practices and coding standards
•    Implement comprehensive unit tests using pytest/unittest with mocking, fixtures, and high code coverage
•    Design and build production-grade Airflow DAGs for workflow orchestration, scheduling, and monitoring
•    Optimize Spark jobs for performance, memory efficiency, and cost reduction
•    Implement CI/CD pipelines for automated testing and deployment using Jenkins, GitHub Actions, or AWS CodePipeline
•    Troubleshoot and debug complex data pipeline issues in production environments
•    Collaborate with Data Scientists, Analysts, and Platform Engineers to deliver data solutions
•    Ensure data quality, security, and compliance standards are met
Required Skills & Qualifications
•    9+ years of hands-on data engineering experience (no management responsibilities required)
•    Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
•    Expert-level Python programming – OOP, design patterns, clean code practices
•    Advanced PySpark/Spark skills – partitioning strategies, shuffle optimization, memory tuning, broadcast joins
•    Strong unit testing expertise using pytest/unittest – mocking, parametrized tests, fixtures, TDD mindset
•    Hands-on Airflow experience – DAG design, custom operators, sensors, XComs, debugging failed tasks
•    Deep AWS experience: S3, EMR, Glue, Redshift, Lambda, Step Functions, IAM, CloudWatch
•    Solid understanding of data lake and warehouse architectures (medallion architecture, Delta Lake)
•    Strong SQL skills – complex queries, window functions, query optimization
•    Proficiency with Git, code reviews, and collaborative development workflows
•    Experience with CI/CD pipelines and automated testing frameworks

Nice to Have (Preferred)
•    Familiarity with Docker for containerized data workloads
•    Exposure to streaming data (Kafka, Spark Streaming)
•    Knowledge of data quality frameworks
•    Background in financial services or regulated industries
•    Understanding of data security and privacy practices (GDPR)

Mandatory Competencies

Big Data - Big Data - Pyspark
DevOps/Configuration Mgmt - DevOps/Configuration Mgmt - Jenkins
Beh - Communication
Database - Database Programming - SQL
Programming Language - Python - Python Scripting
Middleware - Java Middleware - Apache
Programming Language - Python - Apache Airflow
Development Tools and Management - Development Tools and Management - CI/CD
ETL - ETL - AWS Glue

Perks and Benefits for Irisians
Iris provides world-class benefits for a personalized employee experience. These benefits are designed to support financial, health and well-being needs of Irisians for a holistic professional and personal growth. Click here to view the benefits.

Apply now »