Binfeng (Bill) Xu
I'm an AI research engineer. Currently, I work at NVIDIA on autonomous agents.
Formerly, I was a researcher at Samsung Research America (SRA) where I built LLM post-training infra and efficient agentic models.
During my MS. at NYU, I briefly worked with Alfredo Canziani
and Yann LeCun on autonomous driving.
Prior at WFU, I was advised by Grey Ballard on efficient tensor
decomposition and Paúl Pauca on object detection for drone & satellite images. In my spare time, I enjoy ML competitions on Kaggle and
have ranked top 1% globally.
Over the decade, I have been hands-on with a broad range of AI/ML research and
applications. I enjoy training large models on massive data and compute, and to build real world products out of ideas.
|
|
Recent Work & Interests
LLM Post-training. Lately I've been working and researching on the whole spectrum of post-training algorithms and infra: from data synthesis to Knowledge Distillation (eg. SFT with KL), Preference RLHF (eg. DPO),
RLVR (eg. GRPO on hard math), Memory efficient Fine-tuning (eg. LoRA), Inference Acceleration(eg. SSM, Flash Attention, Speculative Decoding),
Quantization and Compilation (eg. AWQ, llama.cpp), Model Mergeing (eg. slerp), etc.
I've also spent tons of time on product and architecting,
and have delivered several core projects for Samsung on-deive RAG, Knowledge Graph QA and agentic chatbots.
LLM Reasoning and Machine Intelligence. Generally bipartites into:
-
Conceptual learning: Modeling multimodal world patterns with (very) long-context autoregressive transformers. eg. LWM.
Recently, Deepseek embarks a successful trail (GRPO) to self-supervise LLM on verified rewards, at a cost of increased test-time compute.
-
Generative reasoning: Symbolic Learning models (eg. AlphaGo) shows the strength at (relatively) small action space, which inspires recent LLM research to
annotate data with Tree Search sampling (eg. MCTS) and process supervision. I'm currently more interested in
GFlowNet in stochastic prediction of wild world space.
Language Agents: Training large models for autonomous actions with optimized reasoning and planning chains.
- ReWOO
:
Eliminate stacking redundancy in ALM systems by decoupling LLM reasoning from observations.
- Gentopia
:
A collaborative agentic framework to build hierarchical agents through config, integrating model specialization, evaluation, sharing and inheritance.
[demo].
|
Papers
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu, Xukun Liu, Hua Shen, Zeyu H, Yuhan L, Murong Y, Zhiyuan P, Yuchen
L, Ziyu Y, Dongkuan Xu
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language
Models
Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan
Xu
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data
Haoyan Yang, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Hongxia Jin, Vijay Srinivasan
Efficient Computation of Tucker Decomposition of Correlation-Based Tensors
Binfeng Xu, Grey Ballard, Robert Lyday, Paul Laurienti
Iterative Constringency Optimization: Preclustering Approach to Agent
Interactive Data
Binfeng Xu, Nicole Dalzell
|
Misc
Petting Kobu,
the Norway Forest 🐱; Cyberpunk; Digital nomad (someday);
Fan of all games by Hidetaka
Miyazaki, who motivated me once into indie game devs; Good at Dota2 (once).
Photography @500px; Minimalist;
|
|
|