Pyspark - Senior Engineer

Apply now »

Posted On: 1 Apr 2026

Location: Noida, UP, India

Company: Iris Software

Why Join Iris?
Are you ready to do the best work of your career at one of India’s Top 25 Best Workplaces in IT industry? Do you want to grow in an award-winning culture that truly values your talent and ambitions?
Join Iris Software — one of the fastest-growing IT services companies — where you own and shape your success story.
 
About Us  
At Iris Software, our vision is to be our client’s most trusted technology partner, and the first choice for the industry’s top professionals to realize their full potential.
With over 4,300 associates across India, U.S.A, and Canada, we help our enterprise clients thrive with technology-enabled transformation across financial services, healthcare, transportation & logistics, and professional services.
Our work covers complex, mission-critical applications with the latest technologies, such as high-value complex Application & Product Engineering, Data & Analytics, Cloud, DevOps, Data & MLOps, Quality Engineering, and Business Automation.

Working with Us
At Iris, every role is more than a job — it’s a launchpad for growth.
Our Employee Value Proposition, “Build Your Future. Own Your Journey.” reflects our belief that people thrive when they have ownership of their career and the right opportunities to shape it.
We foster a culture where your potential is valued, your voice matters, and your work creates real impact. With cutting-edge projects, personalized career development, continuous learning and mentorship, we support you to grow and become your best — both personally and professionally.
Curious what it’s like to work at Iris? Head to this video for an inside look at the people, the passion, and the possibilities. Watch it here.

Job Description

  • Strong proficiency in Python, PySpark / Apache Spark
  • Solid understanding of RDDs, 
  • Spark SQL, and Spark performance tuning
  • Experience in writing optimized ETL/ELT pipelines
  • Experience with SQL and relational databases (PostgreSQL, MySQL, Oracle, etc.)
  • Exposure to Big Data ecosystems (Hadoop, Hive, HDFS)
  • Familiarity with batch and streaming data processing

 

Good to Have 

  • AWS / Azure / GCP (preferred)
    • AWS services such as S3, EMR, Glue, Redshif
  • Version control using Git
  • Experience with CI/CD pipelines
  • Basic familiarity with Docker and workflow schedulers (Airflow preferred)
  • Knowledge of Databrick

 

Responsibility:

 

Design, develop, and maintain data pipelines using PySpark and Python.

Process and transform large structured and unstructured datasets in distributed environments.

Optimize Spark jobs for performance, scalability, and reliability.

Develop reusable data transformation frameworks and utilities.

Integrate data from multiple sources including relational, NoSQL, and streaming systems.

Perform data quality checks, validations, and error handling.

Collaborate with data analysts, data scientists, and upstream/downstream teams.

Support deployment and monitoring of data pipelines in production environments.

Mandatory Competencies

Big Data - Big Data - Pyspark
Data Science and Machine Learning - Data Science and Machine Learning - Apache Spark
Programming Language - Python - Apache Airflow
Database - PostgreSQL - PostgreSQL
Big Data - Big Data - HIVE
Big Data - Big Data - Hadoop
Big Data - Big Data - HDFS
Database - Database Programming - SQL
Beh - Communication and collaboration

Perks and Benefits for Irisians
Iris provides world-class benefits for a personalized employee experience. These benefits are designed to support financial, health and well-being needs of Irisians for a holistic professional and personal growth. Click here to view the benefits.

Apply now »