
On-Site
Full-Time
Bangalore Urban, Karnataka
India
About the Role
Data Engineer 2
Position Summary:
As a Data Engineer, you will be part of a highly talented Data Engineering team. Responsible for developing reusable capabilities and tools to automate various types of data processing pipelines. You will be contributing to different stages of data engineering like data acquisition, ingestion, processing, monitoring pipelines and validating data.
Your contribution will be really crucial in keeping various data ingestion and processing pipelines running successfully. Along with ensuring the data points available in the data lake are up to date, valid and usable.
Technology Experience:
3+ years of experience in data engineering.
Comfortable and hands on with the Python programming.
Strong experience in working with RDBMS and NoSQL systems.
Strong experience in working on AWS ecosystem with hands-on experience in working with different AWS components like Airflow, EMR , Redshift, S3, Athena, PySpark etc.
Strong experience in developing REST APIs with Python using frameworks like flask, fastapi.
Prior experience in working with crawling libraries like BeautifulSoup in Python would be desirable.
Proven ability to work with SQL queries, including writing complex queries to retrieve key metrics. Skilled in connecting to, exploring, and understanding upstream data.
Experience working with various data lake storage format types and ability to choose it based on the use cases.
Responsibilities:
Design and build scalable data pipelines that can handle large volumes of data.
Develop ETL/ELT pipelines and extract the data from any upstream sources and sync with the data lakes with the format of parquet, iceberg, delta formats.
Optimize and ensure the data pipelines are running successfully and ensure the business continuity.
Collaborate with cross functional teams and source all the data required for the business use cases.
Stay up-to-date with emerging data technologies and trends to ensure the continuous improvement of our data infrastructure and architecture
Follow best practices in data querying and manipulation to ensure data integrity.
Position Summary:
As a Data Engineer, you will be part of a highly talented Data Engineering team. Responsible for developing reusable capabilities and tools to automate various types of data processing pipelines. You will be contributing to different stages of data engineering like data acquisition, ingestion, processing, monitoring pipelines and validating data.
Your contribution will be really crucial in keeping various data ingestion and processing pipelines running successfully. Along with ensuring the data points available in the data lake are up to date, valid and usable.
Technology Experience:
3+ years of experience in data engineering.
Comfortable and hands on with the Python programming.
Strong experience in working with RDBMS and NoSQL systems.
Strong experience in working on AWS ecosystem with hands-on experience in working with different AWS components like Airflow, EMR , Redshift, S3, Athena, PySpark etc.
Strong experience in developing REST APIs with Python using frameworks like flask, fastapi.
Prior experience in working with crawling libraries like BeautifulSoup in Python would be desirable.
Proven ability to work with SQL queries, including writing complex queries to retrieve key metrics. Skilled in connecting to, exploring, and understanding upstream data.
Experience working with various data lake storage format types and ability to choose it based on the use cases.
Responsibilities:
Design and build scalable data pipelines that can handle large volumes of data.
Develop ETL/ELT pipelines and extract the data from any upstream sources and sync with the data lakes with the format of parquet, iceberg, delta formats.
Optimize and ensure the data pipelines are running successfully and ensure the business continuity.
Collaborate with cross functional teams and source all the data required for the business use cases.
Stay up-to-date with emerging data technologies and trends to ensure the continuous improvement of our data infrastructure and architecture
Follow best practices in data querying and manipulation to ensure data integrity.