
On-Site
Full-Time
Kochi, Kerala
India
About the Role
Design, develop, and maintain scalable and efficient big data processing pipelines distributed computing systems.
Collaborate with cross-functional teams to understand data requirements and design appropriate data solutions.
Implement data ingestion, processing, and transformation processes to support various analytical and machine learning use cases.
Optimize and tune data pipelines for performance, scalability, and reliability.
Monitor and troubleshoot pipeline performance issues, identifying and resolving bottlenecks.
Ensure data quality and integrity throughout the pipeline, implementing data validation and error handling mechanisms.
Stay updated on emerging technologies and best practices in big data processing and analytics, incorporating them into our data engineering practices.
Document design decisions, technical specifications, and data workflows.
Expertise in Data modelling, Data warehousing concepts, data governance best practices and ETL processes.
Understanding of distributed computing principles and experience with distributed data processing frameworks like Apache Spark or Hadoop.
Familiarity with containerization technologies like Docker and Orchestration tools like Kubernetes.
Develop data models, schemas and structures to support business needs.
Ensure data quality, integrity and security across all data solutions.
Good understanding of a DW platform like Snowflake , RedShift etc.
Desired Skills and Experience
Big Data - Hadoop Platform, Pyspark, SQL, Databricks
Collaborate with cross-functional teams to understand data requirements and design appropriate data solutions.
Implement data ingestion, processing, and transformation processes to support various analytical and machine learning use cases.
Optimize and tune data pipelines for performance, scalability, and reliability.
Monitor and troubleshoot pipeline performance issues, identifying and resolving bottlenecks.
Ensure data quality and integrity throughout the pipeline, implementing data validation and error handling mechanisms.
Stay updated on emerging technologies and best practices in big data processing and analytics, incorporating them into our data engineering practices.
Document design decisions, technical specifications, and data workflows.
Expertise in Data modelling, Data warehousing concepts, data governance best practices and ETL processes.
Understanding of distributed computing principles and experience with distributed data processing frameworks like Apache Spark or Hadoop.
Familiarity with containerization technologies like Docker and Orchestration tools like Kubernetes.
Develop data models, schemas and structures to support business needs.
Ensure data quality, integrity and security across all data solutions.
Good understanding of a DW platform like Snowflake , RedShift etc.
Desired Skills and Experience
Big Data - Hadoop Platform, Pyspark, SQL, Databricks