- Drive the full lifecycle of Big Data projects: from gathering and understanding the end-user needs to implement a fully automated solution.
- Develop and provision of Data pipelines to enable self-service reports and dashboards.
- Deploy AI/Machine learning techniques to answer the appropriate business problems using R or Python.
- Visualize data using Tableau and create repeatable visual analysis for end users to use as tools.
Description
In this role, the Big Data Engineer will be a part of the Data Sciences Innovation Lab. This team serves as the gatekeepers and curators of the BIG data collected into the Data Warehouse/Data Lake from all aspects of our fashion e-commerce business and support data-driven decision-making. He/she will conduct complex data analysis, continuously evolve management reporting, and deliver business insights in an environment of rapid growth and increasing complexity.
Responsibilities:
- Drive the full lifecycle of Big Data projects: from gathering and understanding the end-user needs to implement a fully automated solution.
- Develop and provision of Data pipelines to enable self-service reports and dashboards.
- Deploy AI/Machine learning techniques to answer the appropriate business problems using R or Python.
- Visualize data using Tableau and create repeatable visual analysis for end users to use as tools.
- Take ownership of the existing BI platforms and maintain the data integrity and accuracy of the numbers and data sources.
- Know Agile - Scrum project management experience/knowledge - Ability to prioritise, pushback and effectively manage a data product and sprint backlog.
Requirements:
- 5+ years of experience in building out scalable and reliable ETL/ELT pipelines and processes to ingest data from a variety of data sources, preferably in the ecommerce retail industry.
- Experience in building robust scalable data pipelines using distributed processing frameworks (e.g. Spark, Hadoop, EMR, Flink, Storm), integrated with asynchronous messaging systems (e.g. Apache Kafka, Kinesis, PubSub, MQ Series).
- Deep understanding of Relational Database Management Systems (RDBMS) (e.g. PostgreSQL, MySQL) , No-SQL Databases (e.g. MongoDB, ElasticSearch) and hands-on experience in implementation and performance tuning of MPP databases (e.g. Redshift, BigQuery).
- Strong programming, algorithmic, data processing skills with significant experience in producing production-ready code in Python/Scala/Java etc, engineering experience with machine learning projects like Time Series Forecasting, Classification and Optimization problems.
- Experience administering and deploying CI/CD tools (e.g. Git, Jira, Jenkins) Industrialization (e.g. Ansible, Terraform), Workflow Management ( e.g. Airflow, Jenkins, Luigi) in Linux operating system environments.
- Experience designing and implementing software for Data Security, Cryptography, Data Loss Prevention (DLP), or other security tools.
- Experience with Tableau, Power BI, Superset or any standard data visualization tools.
- Exhibits sound business judgment, a proven ability to influence others, strong analytical skills, and a proven track record of taking ownership, leading data-driven analyses, and influencing results.
- Knowledge of cloud services like AWS, GCP, Azure would be a huge added advantage.
- E-commerce / logistics / fashion retail background a bonus.