Hi! I’m Anshuman. I consider myself a wild-eyed hacker, full-stack AI leader, machine learning systems engineer, deep learning researcher, quant trader, and entrepreneur, roughly in that order.
A little bit more about me #
Deep Learning and LLMs #
I am currently running the Eng team at Brainchain AI, where we strategically use LLMs at scale to help navigate supply chain disruptions, identify business opportunities, and spin up tactical “businesses in a box”.
In my previous work in this space, I was focused on developing novel techniques to make large language models:
- faster at inference time
- faster to train
- smaller in size
while making excellent accuracy-speed/size tradeoffs and reducing model running costs in real-world settings. How did I do this? By focusing on exploiting neural network training and inference dynamics, and combining algorithmic speedup techniques from multiple research areas (linear algebra, optimization, compact data structures, randomized algorithms, etc) with classic model optimization (matrix factorizations, knowledge distillation, pruning, etc).
I like reading research papers.
AI Engineering #
I specialize in conceptualizing, designing and implementing gen AI/ML systems and production pipelines from scratch.
I have extensive experience in optimizing across the data-model-inference lifecycle, sussing out throughput improvements via architectural optimizations ranging from ETL and data pipeline tuning to algorithmic speedups on the modeling side of things.
My current tech stack is Python-focused, with the following tool chains:
- LLMs - Sonnet 3.5, gpt4o, Llama3-8b
- LLM providers - Together, Groq, Anthropic, OpenAI (in that order)
- Distributed training - ~Ray~ Modal
- Workflow orchestration - ~Airflow~ Modal
- LLM Ops - Portkey!, instructor, llamaindex, llama-cpp, and a bunch of other stuff I can’t talk about yet
- Classic deep learning - pytorch, huggingface, backpack
- Tabular ML - numpy, pandas, scikit-learn, xgboost
- Performance - numba, polars, RAPIDS (Dask, CuPy, CuML)
- Feature engineering - featuretools, tsfresh
- Model Serving - FastAPI, Ray Serve
My preferred data stack depends on the problem being solved, but eventually settles down into a combination of Redis, Postgres, and Parquet files on S3.
For MVP purposes, I have discovered that Postgres and Redis will cover most operating modes.
My go-to Big Data stack defaults to Spark. I spoke at the Spark Summit on how to scale topological data analysis (a classic heavily compute-intensive task) to TBs of data on a Spark cluster by adapting locality-sensitive hashing (LSH) to tame the compute beast: https://www.databricks.com/session/enterprise-scale-topological-data-analysis-using-spark
Startups #
I have started and failed to grow 3 tech companies so far; perhaps the fourth time’s the charm. (Insert appropriate Jedi mind trick for funding here.) I have learnt a lot about coming up with ideas and validating them, then matching them with business models, and shaping products and offerings (preferably even in the absence of a pandemic).
Trading #
My trading past includes a fairly sordid time trading FX forwards and swaps on the sell-side at an infamous bulge-bracket investment bank in NYC, as well as some exciting times trading equity index derivatives during the Great Financial Crisis. In fact, the Financial Times wrote a pretty cool article1 about my life as a prop trader.
Education #
I have a BTech and MTech in CS from IIT Kharagpur, and an MBA from UNC Chapel Hill.
Working with me #
I love solving problems, and I love doing so for startups (or startup teams in large orgs)!
If you have an interesting problem (LLMs? time series modeling? anomaly detection? model speedups? accuracy boosts? setting up a gen AI/data science/applied ML group?) that you’d like me to take a look at, or if you just want to buy me coffee (I live in SF), feel free to hmu on LinkedIn or email.
-
Here’s an Archive link. ↩︎