Hello, I'm

Vijay Tulluri

Senior Software Engineer | Technical Leader - Data & AI

IT Professional with 9+ years of experience in Data Engineering, Software Development, and Machine Learning. Expert in building scalable data pipelines, ML systems, and Agentic AI solutions across sustainability, finance, retail, and ad-tech domains.

150+
Data Pipelines Delivered
9+
ML & Agentic Solutions
$6M+
Revenue Saved
9+
Years of Experience

Skills & Expertise

Specialized in building scalable data infrastructure, ML systems, and cutting-edge AI solutions

Data Engineering & Cloud

Python, Scala, SQL, Shell

Expert

Databricks & Delta Lake

Expert

AWS, Azure, GCP

Expert

Apache (Hadoop, Spark, Airflow, Hive, Trino, Presto, Kafka, Flink)

Expert

Snowflake, BigQuery, Redshift

Advanced

AI & Machine Learning

LLMs & Transformers

Expert

RAG & LangChain

Expert

BERT & NLP

Advanced

Machine Learning & MLOps

Advanced

Weights & Biases

Advanced

DevOps & Infrastructure

Docker & Kubernetes

Advanced

Terraform

Advanced

Jenkins, GitHub Actions, Sonar

Advanced

Work Experience

Aug 2023 – Aug 2024

Sr. Software Engineer – Data & AI

Nike, Sustainability Foundation Analytics

💰 Business Impact: $1.5M+ saved through automation

Developed Agentic AI-powered sustainability assistant using LangChain with RAG. Built scalable ingestion pipelines from 40+ sources. Led Snowflake to Databricks migration using medallion architecture.

Apr 2022 – June 2023

Software Engineer III – Data Engineering & ML

Apple, Ad-Platforms

💰 Business Impact: $1.8M+ in operational efficiency

Architected and deployed Apache Airflow across 17 environments for Apple's Ad Platforms. Built ETL test automation frameworks reducing manual testing by 60%.

Dec 2021 – Apr 2022

Senior Software Engineer - ML

Vanguard, Advice Engagement Labs

💰 Business Impact: $600K+ revenue impact

Engineered highly scalable ETL/ELT pipelines with Airflow and Snowflake. Integrated ML recommendation outputs with Microsoft Dynamics 365 using RESTful APIs.

Sep 2019 – Dec 2021

Senior Software Engineer - Data & AI

Capital One, Fraud & Risk Mitigation

💰 Business Impact: $800K+ in fraud prevention

Orchestrated data pipelines into OpenML-based fraud risk engine. Integrated DVC with S3 for dataset versioning. Built complex DAGs in Apache Airflow.

Aug 2017 – May 2019

Data & AI Research

University of North Texas

💰 Business Impact: Research contribution

Developed distributed big data pipeline for predicting energy consumption using Apache Spark, XGBoost, and LSTM. Achieved 15% improvement over baseline models.

Oct 2015 – July 2017

Software Engineer – Data & AI

Invesco, Budget & Forecasting

💰 Business Impact: $100K+ cost reduction

Reduced asset maintenance costs by 30% with real-time anomaly detection using CART and ARIMA. Built streaming pipelines with Kafka and InfluxDB.

Trusted by Industry Leaders

Delivered Exclusive Growth Hacks & Enterprise Solutions

Nike
Nike
Apple
Apple
Vanguard
Vanguard
Capital One
Capital One
University of North Texas
University of North Texas
Invesco
Invesco
Nike
Nike
Apple
Apple
Vanguard
Vanguard
Capital One
Capital One
University of North Texas
University of North Texas
Invesco
Invesco

Projects Delivered

Agentic AI Sustainability Assistant

Built AI-powered assistant using LangChain with RAG, retrieving carbon emissions metrics from Databricks vector store for regulatory-compliant sustainability reporting.

LangChainRAGDatabricksFastAPI

Enterprise FAE Metrics Platform

Aggregated SonarQube, GitHub, Jenkins metrics into Delta Lake to compute engineering KPIs, enabling automated delivery of team health dashboards to leadership.

Delta LakePySparkAirflow

Hybrid Cloud Airflow Infrastructure

Architected Apache Airflow across 17 environments (on-prem and AWS) with Terraform IaC, supporting scalable DAG executions for Apple's Ad Platforms.

AirflowTerraformAWSKubernetes

Real-Time Fraud Detection Engine

Orchestrated pipelines from Snowflake, Data Lake, and Elasticsearch into OpenML-based fraud risk engine, enabling real-time fraud score computation.

OpenMLSnowflakeFlaskAWS

Smart Campus Energy Forecasting

Developed ML pipeline using XGBoost, LSTM, and Prophet for time-series forecasting of campus energy consumption, achieving 15% improvement over baseline.

PythonXGBoostLSTMSpark

Medallion Architecture Migration

Led Snowflake to Databricks migration implementing raw, bronze, silver, gold layers with Z-ordering and liquid clustering for optimized query performance.

DatabricksDelta LakeSnowflake
System Design

System Design HLD

Deep dive into the technical architecture of enterprise-scale Data Engineering and ML systems

Agentic AI Sustainability Assistant Architecture
Machine Learning

Agentic AI Sustainability Assistant Architecture

Nike Sustainability Foundation

LangChain-powered RAG system retrieving carbon emissions metrics from Databricks vector store for regulatory-compliant sustainability reporting. Features multi-agent orchestration and real-time data retrieval.

Key Highlights:

  • Multi-agent LangChain orchestration
  • Databricks vector store integration
  • Real-time carbon metrics retrieval
  • Regulatory compliance automation
LangChainRAGDatabricksVector StoreFastAPIPython
Enterprise FAE Metrics Platform Architecture
Data Engineering

Enterprise FAE Metrics Platform Architecture

Fidelity Investments

Multi-source data ingestion platform aggregating metrics from SonarQube, GitHub, Jenkins, and Jira into Delta Lake. Implements medallion architecture with automated KPI computation and leadership dashboards.

Key Highlights:

  • Multi-source data aggregation (4 platforms)
  • Medallion architecture (Bronze/Silver/Gold)
  • Automated KPI dashboard delivery
  • 200+ repository monitoring
Delta LakePySparkApache AirflowSonarQubeGitHub APIJenkinsJira API
Hybrid Cloud Airflow Infrastructure
Infrastructure

Hybrid Cloud Airflow Infrastructure

Apple Ad-Platforms

Enterprise-scale Apache Airflow deployment across 17 environments (on-prem and AWS). Terraform-based IaC enabling scalable DAG executions with Kubernetes orchestration.

Key Highlights:

  • 17 environment deployment
  • Hybrid cloud architecture (on-prem + AWS)
  • Terraform IaC automation
  • Kubernetes-based scaling
Apache AirflowTerraformAWSKubernetesDockerGitOps
Real-Time Fraud Detection Engine
Machine Learning

Real-Time Fraud Detection Engine

Capital One

Streaming data pipeline orchestrating data from Snowflake, Data Lake, and Elasticsearch into OpenML-based fraud risk engine. Enables real-time fraud score computation with sub-second latency.

Key Highlights:

  • Real-time fraud scoring
  • Multi-source data orchestration
  • Sub-second latency
  • OpenML model serving
OpenMLSnowflakeElasticsearchApache AirflowFlaskAWS
Medallion Architecture Migration
Data Engineering

Medallion Architecture Migration

Nike Sustainability Foundation

Snowflake to Databricks migration implementing medallion architecture with Z-ordering and liquid clustering. Features raw, bronze, silver, and gold layers with automated data quality checks.

Key Highlights:

  • Snowflake to Databricks migration
  • 4-layer medallion architecture
  • Z-ordering & liquid clustering
  • Automated data quality checks
DatabricksDelta LakeSnowflakePySparkUnity Catalog

Technical Blog

Deep dives into Data Engineering, ML, AI, and emerging technologies

Data Engineering

Data Engineering Fundamentals: Building Scalable Data Pipelines

Learn the core concepts of data engineering, from ETL pipelines to data warehousing, explained in simple terms.

Dec 15, 2024
8 min read
Data EngineeringETLDatabricks
Read Article
Crypto

Blockchain & Crypto: Understanding Decentralized Systems

Demystifying blockchain technology and cryptocurrencies with simple explanations and real-world applications.

Dec 10, 2024
7 min read
BlockchainCryptoWeb3
Read Article
Machine Learning

Building Production ML Pipelines: From Data to Deployment

A comprehensive guide to building, training, and deploying machine learning models in production environments.

Dec 5, 2024
10 min read
Machine LearningMLOpsPython
Read Article
Deep Learning

Deep Learning Architectures: CNNs, RNNs, and Transformers

Understanding neural network architectures and when to use CNNs, RNNs, LSTMs, and Transformer models.

Nov 28, 2024
12 min read
Deep LearningNeural NetworksPyTorch
Read Article
LLM

Large Language Models (LLMs): GPT, BERT, and Beyond

Exploring how Large Language Models work, from transformers to prompt engineering and fine-tuning.

Nov 20, 2024
11 min read
LLMGPTBERT
Read Article
RAG

Retrieval-Augmented Generation (RAG): Grounding LLMs in Reality

Learn how RAG combines retrieval systems with LLMs to create accurate, domain-specific AI applications.

Nov 15, 2024
10 min read
RAGLLMVector Database
Read Article