Wednesday 17 July 2024

Good Use case for Kafka , Flink Streaming - Building a Ad server

 https://www.kai-waehner.de/blog/2023/09/15/how-to-build-a-real-time-advertising-platform-with-apache-kafka-and-flink/

Intergrating Kafka with your API

 Integrating Kafka with your application API without making direct changes to the API itself can be achieved by using various techniques and tools. This approach typically involves setting up an intermediary service or using a middleware that intercepts API requests and forwards the necessary data to Kafka. Here are a few strategies to achieve this:


Example - You want to know what are the ad request you are making . Example want to know which songs users are playing on spotify


Strategy 1: Sidecar Pattern

The sidecar pattern involves running a companion service alongside your main application that handles Kafka integration. This can be particularly useful in containerized environments like Kubernetes.

Steps to Implement Sidecar Pattern

  1. Create a Sidecar Service:

    • Develop a separate service that consumes API logs or data and sends it to Kafka.
  2. Deploy the Sidecar:

    • Deploy the sidecar service alongside your main application.

Example Implementation

Sidecar Service (Python):

python
import requests from confluent_kafka import Producer import json # Kafka Producer configuration conf = { 'bootstrap.servers': 'localhost:9092', 'client.id': 'sidecar-producer' } producer = Producer(**conf) def send_to_kafka(topic, message): producer.produce(topic, json.dumps(message).encode('utf-8')) producer.flush() print(f"Message sent to Kafka topic {topic}") def intercept_and_forward(url, headers, params=None): response = requests.get(url, headers=headers, params=params) data = response.json() send_to_kafka('your_topic', data) return data # Example usage if __name__ == "__main__": url = "http://example.com/api/data" headers = {"Authorization": "Bearer token"} data = intercept_and_forward(url, headers) print(data)

Kubernetes Deployment Example

Kubernetes Pod Definition:

yaml
apiVersion: v1 kind: Pod metadata: name: api-with-sidecar spec: containers: - name: main-app image: your-main-app-image ports: - containerPort: 80 - name: sidecar image: your-sidecar-image env: - name: KAFKA_BOOTSTRAP_SERVERS value: "localhost:9092"

Strategy 2: API Gateway

Using an API gateway allows you to manage, secure, and monitor API traffic, as well as intercept and forward data to Kafka without modifying the backend API.

Steps to Implement API Gateway Integration

  1. Set Up API Gateway:

    • Use an API gateway solution like Kong, Apigee, or Amazon API Gateway.
  2. Configure Plugins or Middleware:

    • Use plugins or middleware to intercept requests and forward data to Kafka.

Example with Kong Gateway

  1. Install Kong and Kafka Plugin:

    • Follow the Kong installation guide.
    • Install the Kafka logging plugin for Kong.
  2. Configure Kafka Plugin:

yaml
services: - name: example-service url: http://example.com routes: - name: example-route paths: - /api plugins: - name: kafka-log config: bootstrap_servers: "localhost:9092" topic: "your_topic" producer_config: acks: "all"
  1. Apply Configuration:
bash
# Assuming you have a Docker-based setup docker-compose up -d

Strategy 3: Log-Based Integration

You can also use a log-based approach where application logs are collected and processed to send data to Kafka.

Steps to Implement Log-Based Integration

  1. Set Up Log Forwarder:

    • Use a log forwarder like Filebeat, Fluentd, or Logstash.
  2. Configure Log Forwarder to Send Data to Kafka:

Filebeat Configuration Example:

yaml
filebeat.inputs: - type: log paths: - /var/log/myapp/*.log output.kafka: hosts: ["localhost:9092"] topic: "your_topic" codec.json: pretty: true
  1. Run Log Forwarder:
bash
filebeat -e -c filebeat.yml

Summary

These strategies allow you to integrate Kafka with your application API without making direct changes to the API code:

  1. Sidecar Pattern: Run a companion service alongside your application.
  2. API Gateway: Use an API gateway to intercept and forward data.
  3. Log-Based Integration: Use log forwarders to process and send logs to Kafka.

Choose the strategy that best fits your architecture and operational requirements.

Tuesday 6 June 2023

Leadership and Growth Mindset

 https://sites.google.com/deliveroo.com/tech-leadership/engineering-manager-prep/leadership-growth-mindset

Behaviour Interview Prep - Manager

 https://sites.google.com/deliveroo.com/tech-leadership/engineering-manager-prep/behavioural



Resolving Conflicts

Principle of conflict resolution 
  • everyone is right in their eyes 
  • avoid false dichotomies (Black or white) 
  • people generally present a position , but rarely 
  • that is the issue, the person resolving issue need to understand the real issue by using listening skills
  • Dont dig in, that is dont try to explain why you think you are right, i am right because it leads to escalations, look for sound of argument 
  • dont use exagerated statement - you always 
  • seek unity of purpose 
  • focus on where we are going 
  • its not person but process -- how can we make sure
  • such conflicts do not appear in future example RACI Framework
  • look for win wins mutual benefits 
  • long term -- develop a culture where people listen to each other

When it comes to creating resolving conflicts a few of these communication skills come in Handy. 

  • Empathy , when you empathise with people it allows them to move over things , otherwise you will hear people saying the same things in many conversations 
  • Acknowlegement - When listening to someone acknowlegement like nodding , saying yes motivates the speaker, absense of these kinds of motivation leads to demotivation 
  • Reflecting or Rephrasing -- Listening is more than Hearing, we need to understand what the other person is feeling and rephrase it. Reflecting is showing the other person mirror.  This is what i think you said 
  • Brainstorming - Is allowing people to come up with ideas without Judging. Note that a idea might not be good but it might lead someone to come up with better idea on those lines


Framewok for conflict resolution 
  • Find Agreement 
  • Clarify disagreement 
  • express empathy 
  • seek alternatives 
  • Agree on criteria for solution 
  • Combine and create 
  • Reach consensus

Project Walkthrough for Interview in STAR Format

 Overview of Project 

Company had roadmap to create a Financial Data lake. Software engineering team Was tasked with creating infrastructure for this data lake

  • that is the actual data lake to store the data
  • ETL tool which would connect to this datalake on which jobs needs to be created
  • Tools to query this data lake 

Data Engineering team which I was leading was tasked with Migrating data pipeline to datalake, using tools developed by Software engineering team. Wireframes on how this tool would look like was shared with team and data Engineering team waited for Software engineering tool to complete

Their work so we can start working on pipeline migration. 


STAR Format follows 

Situation (Challenges you faced)

  • 6 months prior to the data lake project completion date, My senior manager discusses the project and asks me to get started. At this time his understanding is the same that a tool would be developed. 
  • After discussing with Software engineering team, we understand that they have hit roadblocks and entire effort is delayed, They have not even started the tool devleopment and have not completed the Datalake infrastructure development
  • I escalated it to senior management clarifying that Data Engineering project cannot start until ETL tool development is completed. 
  • After multiple hard discussions with Senior management , Software engineering team accepted that they have completed missed timeline and there is no time available for Data engineering to migrate.
  • Software engineering team will not work on creation of tool, instead they will only work on Creation of Data lake API which will be used by Data engineering team in their solution
  • We were  to run the development efforts in Parallel with Data lake API Development 
  • Data engineering will work on migration using AWS platform and Data lake API 


Tasks (Decision you took ) 

  • Were tasked with creating a approach for migration, with aim of completing it in next 6 months.

Action (How you worked with your team and cross functional peers) 

  • Discussed with team came up with few approaches how we could do the project 
  • Had POC done on multiple approaches in a weeks timeframe , presented pro and cons to management 
  • Decided on approach to follow 
  • Lead development effort on solution proposed by team , During development optimized the current pipeline which lead to improvement in SLA
  • Had to deal with mutliple issue as Data lake API were not fully matured and had bugs which had potential to derail the roadmap
  • Migrated the ETL data pipeline in next 6 months 
  • Created a resuable solution which could be used by mulitple teams 
  •  

Result (Metrics that was impacted) 

  • Were able to migrate the pipeline before anticipate time, allow ample of time for use testing and overall success of Datalake 
  • Reduce time for data delay from 6 hours to 1 hour 
  • Showcased to organization best practices around ETL pipeline development. Infrastructure as code , Unit test cases for Data pipelines. Performance optimizations