█████╗ ████████╗██╗  ██╗ █████╗ ██████╗ ██╗   ██╗ █████╗     ██╗    ██╗ █████╗ ██╗      █████╗ ██╗    ██╗ █████╗ ██╗     ██╗  ██╗ █████╗ ██████╗
██╔══██╗╚══██╔══╝██║  ██║██╔══██╗██╔══██╗██║   ██║██╔══██╗    ██║    ██║██╔══██╗██║     ██╔══██╗██║    ██║██╔══██╗██║     ██║ ██╔╝██╔══██╗██╔══██╗
███████║   ██║   ███████║███████║██████╔╝██║   ██║███████║    ██║ █╗ ██║███████║██║     ███████║██║ █╗ ██║███████║██║     █████╔╝ ███████║██████╔╝
██╔══██║   ██║   ██╔══██║██╔══██║██╔══██╗╚██╗ ██╔╝██╔══██║    ██║███╗██║██╔══██║██║     ██╔══██║██║███╗██║██╔══██║██║     ██╔═██╗ ██╔══██║██╔══██╗
██║  ██║   ██║   ██║  ██║██║  ██║██║  ██║ ╚████╔╝ ██║  ██║    ╚███╔███╔╝██║  ██║███████╗██║  ██║╚███╔███╔╝██║  ██║███████╗██║  ██╗██║  ██║██║  ██║
╚═╝  ╚═╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝  ╚═══╝  ╚═╝  ╚═╝     ╚══╝╚══╝ ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝

ms ai · sf state · may 2026

rl, agents, systems.

i like making models do things they probably should not be able to do.

outside of that i take photos and am currently catching up on one piece.

applied ai intern
chewy
  • built a neo4j graph memory layer over 4.5 billion clickstream events so downstream agents could retrieve user context in under 100ms instead of re-deriving it from raw logs every time.
  • built an autonomous diagnostic agent that detected ad campaign anomalies and pushed root causes to marketing workflows.
  • shipped a hybrid personalization engine for the homepage that reranked content in real time with sub-50ms latency.
wolfeclick
rl for competitive pokemon
  • i wanted to train llms for long-horizon reasoning under hidden information. pokemon works: you do not see the full opponent team, and a decision on turn two still matters at turn fifteen.
  • i wrapped smogon's api for real-time rollouts and tried grpo. the model immediately spat invalid json, which taught me why labs sft before rl. after stabilizing qwen3-4b, i shaped rewards: attack-heavy made it hyper-aggressive, defense-heavy made it stall forever. same model, completely different emergent behavior from one number.
  • it still loses often, but watching one scalar determine whether it rushes or cowers was my clearest lesson in how objectives shape behavior.
github hf space model
commentator
real-time ai game caster · google gemini hackathon
3rd place · $20k
  • most ai commentary tools just caption everything. the insight was that commentary is not about describing what happened — it is about knowing what is worth saying. built an event detector that only fires when something meaningful changes, which cut redundant output by 60% and made the whole thing feel live instead of laggy. concept to working demo in 12 hours.
wanderlust
ai trip planner
  • built for people who do not want to spend a week in a city and come back having only seen the same ten things that show up on every travel blog. feed it a destination and budget, it surfaces neighborhood-level spots, clusters them geographically so you are not commuting across the city between stops, and estimates real costs from a live index.
  • llms default to the obvious. getting genuinely local recommendations required building explicit novelty and geographic pressure into the prompt stack — alongside hard constraints on real lat/lng coordinates and no duplicate venues. runs serverless on cloudflare workers with d1 sqlite at the edge.
adaptive rl for dynamic roi selection in rppg
ms thesis · sf state
  • rppg extracts heart rate from video by detecting subtle color changes in facial skin. most methods fix the region of interest to the full face or a static patch, but signal quality varies heavily by lighting, motion, and skin tone across different regions.
  • i trained a ppo agent to dynamically select and weight facial regions per frame using a 64-dimensional state space from mediapipe. the policy improved hr_mae from 33.58 to 21.76 beats per minute. the honest catch is that the gains came at a cost to ppg waveform correlation, and dense rewards underperformed sparse ones. that suggests the waveform-level signal is harder to translate into useful gradients than expected.
pivotrl
personal research
  • not all tokens in a reasoning trace are equally worth learning from. i detect high-loss pivot points in kimi k2.5 traces and up-weight them when distilling into gemma 4.
  • uniform sft on reasoning traces dropped gsm8k from 85.5% to 74.5%. pivot-weighted sft recovered it partially to 80%. the null result is informative: naive distillation on reasoning traces actively hurts, which is not obvious from the literature.
edge-ai flood alert system
indian meteorological department · provisional patent
  • flood-prone areas in india often do not have reliable internet infrastructure, so a cloud-dependent alert system is useless to the people who need it most. the whole thing had to run offline.
  • trained an encoder-decoder lstm on 45 years of hourly rainfall data (1979-2024) across three mumbai neighborhoods — matunga east, byculla, and dahisar — each with ~400k records. models hit mae around 0.17 across all three locations. deployed as tflite on a raspberry pi 4 serving a fastapi backend, with a flutter app using the haversine formula to route users to the nearest model based on their gps coordinates. alerts go out over local wlan with no internet required.
credit card fraud detection
publication
  • compared xgboost, random forest, svm, and logistic regression on highly imbalanced transaction datasets. the result that actually mattered was not which algorithm won. it was that the class imbalance handling strategy created larger performance swings across method families than the model choice itself.
paper
ms data science and ai
sf state
4.0 · may 2026
b.tech electronics and communication
pune university
2024
let's talk.

looking for ai research, applied ai, and mle roles starting may 2026. also applying to ai safety research fellowships.

if you want to talk about rl, why sparse rewards beat dense ones, or why one piece is actually a masterpiece, i am up for it.

atharvawal27@gmail.com