█████╗ ████████╗██╗  ██╗ █████╗ ██████╗ ██╗   ██╗ █████╗     ██╗    ██╗ █████╗ ██╗      █████╗ ██╗    ██╗ █████╗ ██╗     ██╗  ██╗ █████╗ ██████╗
██╔══██╗╚══██╔══╝██║  ██║██╔══██╗██╔══██╗██║   ██║██╔══██╗    ██║    ██║██╔══██╗██║     ██╔══██╗██║    ██║██╔══██╗██║     ██║ ██╔╝██╔══██╗██╔══██╗
███████║   ██║   ███████║███████║██████╔╝██║   ██║███████║    ██║ █╗ ██║███████║██║     ███████║██║ █╗ ██║███████║██║     █████╔╝ ███████║██████╔╝
██╔══██║   ██║   ██╔══██║██╔══██║██╔══██╗╚██╗ ██╔╝██╔══██║    ██║███╗██║██╔══██║██║     ██╔══██║██║███╗██║██╔══██║██║     ██╔═██╗ ██╔══██║██╔══██╗
██║  ██║   ██║   ██║  ██║██║  ██║██║  ██║ ╚████╔╝ ██║  ██║    ╚███╔███╔╝██║  ██║███████╗██║  ██║╚███╔███╔╝██║  ██║███████╗██║  ██╗██║  ██║██║  ██║
╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝  ╚═══╝  ╚═╝  ╚═╝     ╚══╝╚══╝ ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝

ms ai · sf state · may 2026

rl, agents, systems.

i like making models do things they probably should not be able to do.

outside of that i take photos and am currently catching up on one piece.

email github linkedin hugging face blog ↗ goodreads ↗

experience

applied ai intern

summer 2025

chewy

built a neo4j graph memory layer over 4.5 billion clickstream events so downstream agents could retrieve user context in under 100ms instead of re-deriving it from raw logs every time.
built an autonomous diagnostic agent that detected ad campaign anomalies and pushed root causes to marketing workflows.
shipped a hybrid personalization engine for the homepage that reranked content in real time with sub-50ms latency.

projects

wolfeclick

mar 2026

rl for competitive pokemon

i wanted to train llms for long-horizon reasoning under hidden information. pokemon works: you do not see the full opponent team, and a decision on turn two still matters at turn fifteen.
i wrapped smogon's api for real-time rollouts and tried grpo. the model immediately spat invalid json, which taught me why labs sft before rl. after stabilizing qwen3-4b, i shaped rewards: attack-heavy made it hyper-aggressive, defense-heavy made it stall forever. same model, completely different emergent behavior from one number.
it still loses often, but watching one scalar determine whether it rushes or cowers was my clearest lesson in how objectives shape behavior.

github hf space model

commentator

dec 2025

real-time ai game caster · google gemini hackathon

3rd place · $20k

most ai commentary tools just caption everything. the insight was that commentary is not about describing what happened — it is about knowing what is worth saying. built an event detector that only fires when something meaningful changes, which cut redundant output by 60% and made the whole thing feel live instead of laggy. concept to working demo in 12 hours.

wanderlust

2026

ai trip planner

built for people who do not want to spend a week in a city and come back having only seen the same ten things that show up on every travel blog. feed it a destination and budget, it surfaces neighborhood-level spots, clusters them geographically so you are not commuting across the city between stops, and estimates real costs from a live index.
llms default to the obvious. getting genuinely local recommendations required building explicit novelty and geographic pressure into the prompt stack — alongside hard constraints on real lat/lng coordinates and no duplicate venues. runs serverless on cloudflare workers with d1 sqlite at the edge.

research

adaptive rl for dynamic roi selection in rppg

2025 – 2026

ms thesis · sf state

rppg extracts heart rate from video by detecting subtle color changes in facial skin. most methods fix the region of interest to the full face or a static patch, but signal quality varies heavily by lighting, motion, and skin tone across different regions.
i trained a ppo agent to dynamically select and weight facial regions per frame using a 64-dimensional state space from mediapipe. the policy improved hr_mae from 33.58 to 21.76 beats per minute. the honest catch is that the gains came at a cost to ppg waveform correlation, and dense rewards underperformed sparse ones. that suggests the waveform-level signal is harder to translate into useful gradients than expected.

pivotrl

ongoing 2026

personal research

not all tokens in a reasoning trace are equally worth learning from. i detect high-loss pivot points in kimi k2.5 traces and up-weight them when distilling into gemma 4.
uniform sft on reasoning traces dropped gsm8k from 85.5% to 74.5%. pivot-weighted sft recovered it partially to 80%. the null result is informative: naive distillation on reasoning traces actively hurts, which is not obvious from the literature.

earlier work

edge-ai flood alert system

2024

indian meteorological department · provisional patent

flood-prone areas in india often do not have reliable internet infrastructure, so a cloud-dependent alert system is useless to the people who need it most. the whole thing had to run offline.
trained an encoder-decoder lstm on 45 years of hourly rainfall data (1979-2024) across three mumbai neighborhoods — matunga east, byculla, and dahisar — each with ~400k records. models hit mae around 0.17 across all three locations. deployed as tflite on a raspberry pi 4 serving a fastapi backend, with a flutter app using the haversine formula to route users to the nearest model based on their gps coordinates. alerts go out over local wlan with no internet required.

credit card fraud detection

2023

publication

compared xgboost, random forest, svm, and logistic regression on highly imbalanced transaction datasets. the result that actually mattered was not which algorithm won. it was that the class imbalance handling strategy created larger performance swings across method families than the model choice itself.

paper

education

ms data science and ai

sf state

4.0 · may 2026

b.tech electronics and communication

pune university

2024

contact

let's talk.

looking for ai research, applied ai, and mle roles starting may 2026. also applying to ai safety research fellowships.

if you want to talk about rl, why sparse rewards beat dense ones, or why one piece is actually a masterpiece, i am up for it.

atharvawal27@gmail.com