Overview of Project
Company had roadmap to create a Financial Data lake. Software engineering team Was tasked with creating infrastructure for this data lake
- that is the actual data lake to store the data
- ETL tool which would connect to this datalake on which jobs needs to be created
- Tools to query this data lake
Data Engineering team which I was leading was tasked with Migrating data pipeline to datalake, using tools developed by Software engineering team. Wireframes on how this tool would look like was shared with team and data Engineering team waited for Software engineering tool to complete
Their work so we can start working on pipeline migration.
STAR Format follows
Situation (Challenges you faced)
- 6 months prior to the data lake project completion date, My senior manager discusses the project and asks me to get started. At this time his understanding is the same that a tool would be developed.
- After discussing with Software engineering team, we understand that they have hit roadblocks and entire effort is delayed, They have not even started the tool devleopment and have not completed the Datalake infrastructure development
- I escalated it to senior management clarifying that Data Engineering project cannot start until ETL tool development is completed.
- After multiple hard discussions with Senior management , Software engineering team accepted that they have completed missed timeline and there is no time available for Data engineering to migrate.
- Software engineering team will not work on creation of tool, instead they will only work on Creation of Data lake API which will be used by Data engineering team in their solution
- We were to run the development efforts in Parallel with Data lake API Development
- Data engineering will work on migration using AWS platform and Data lake API
Tasks (Decision you took )
- Were tasked with creating a approach for migration, with aim of completing it in next 6 months.
Action (How you worked with your team and cross functional peers)
- Discussed with team came up with few approaches how we could do the project
- Had POC done on multiple approaches in a weeks timeframe , presented pro and cons to management
- Decided on approach to follow
- Lead development effort on solution proposed by team , During development optimized the current pipeline which lead to improvement in SLA
- Had to deal with mutliple issue as Data lake API were not fully matured and had bugs which had potential to derail the roadmap
- Migrated the ETL data pipeline in next 6 months
- Created a resuable solution which could be used by mulitple teams
Result (Metrics that was impacted)
- Were able to migrate the pipeline before anticipate time, allow ample of time for use testing and overall success of Datalake
- Reduce time for data delay from 6 hours to 1 hour
- Showcased to organization best practices around ETL pipeline development. Infrastructure as code , Unit test cases for Data pipelines. Performance optimizations
No comments:
Post a Comment