Engineering Challenges solved
Building Data lake
- A system built over time was moved to Datalake
- Initial system has multiple redshifts / Compaction issues / Multiple s3 paths from which data was consumed by customers
- Ownership was divided among teams but not logically
- access was not controlled , redundant access
STAR format - How was the impact measured
- SLA improvement -- Team was able to redesign pipeline during migration to datalake and take our redundant steps improving sla 6 hours to 3 hours
- Cost saving of 200k by moving processing
People Challenges solved
Hiring and Recruiting Issues solved
Big Projects Handled
Appraisal Ratings
Cost Saving
- What is cost of Redshift
- What is cost of EMR
- Cost of Athena
- Number of nodes
Ra3.16x Large - Reserved instance - 75,000 - 48vcpu , 384 ram , 128TB space , Scales up to 16 petabytes
DS2- 8x large - 16TB , HDD , 244gb memory , -- We had 50 Node cluster - DS2 is deprecated - 30 thousand
DC2-8x - 32vcpu ,244 gb memory, 2.5TB SSD,
EMR Type used - R5d - 48 VCPU, 512 memory - Supports upto EMR 6.3
No comments:
Post a Comment