Binfeng (Bill) Xu

I'm an AI research engineer. Currently, I work at NVIDIA on autonomous agents.

Formerly, I was a researcher at Samsung Research America (SRA) where I built LLM post-training infra and efficient agentic models. During my MS. at NYU, I briefly worked with Alfredo Canziani and Yann LeCun on autonomous driving. Prior at WFU, I was advised by Grey Ballard on efficient tensor decomposition and Paúl Pauca on object detection for drone & satellite images. In my spare time, I enjoy ML competitions on Kaggle and have ranked top 1% globally.

Over the decade, I have been hands-on with a broad range of AI/ML research and applications. I enjoy training large models on massive data and compute, and to build real world products out of ideas.

GitHub  |  LinkedIn  |  Scholar  | 

Recent Work & Interests

LLM Post-training. Lately I've been working and researching on the whole spectrum of post-training algorithms and infra: from data synthesis to Knowledge Distillation (eg. SFT with KL), Preference RLHF (eg. DPO), RLVR (eg. GRPO on hard math), Memory efficient Fine-tuning (eg. LoRA), Inference Acceleration(eg. SSM, Flash Attention, Speculative Decoding), Quantization and Compilation (eg. AWQ, llama.cpp), Model Mergeing (eg. slerp), etc. I've also spent tons of time on product and architecting, and have delivered several core projects for Samsung on-deive RAG, Knowledge Graph QA and agentic chatbots.

LLM Reasoning and Machine Intelligence. Generally bipartites into:

  • Conceptual learning: Modeling multimodal world patterns with (very) long-context autoregressive transformers. eg. LWM. Recently, Deepseek embarks a successful trail (GRPO) to self-supervise LLM on verified rewards, at a cost of increased test-time compute.
  • Generative reasoning: Symbolic Learning models (eg. AlphaGo) shows the strength at (relatively) small action space, which inspires recent LLM research to annotate data with Tree Search sampling (eg. MCTS) and process supervision. I'm currently more interested in GFlowNet in stochastic prediction of wild world space.

Language Agents: Training large models for autonomous actions with optimized reasoning and planning chains.

  • ReWOO GitHub stars: Eliminate stacking redundancy in ALM systems by decoupling LLM reasoning from observations.
  • Gentopia GitHub stars: A collaborative agentic framework to build hierarchical agents through config, integrating model specialization, evaluation, sharing and inheritance. [demo].

Papers

Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu, Xukun Liu, Hua Shen, Zeyu H, Yuhan L, Murong Y, Zhiyuan P, Yuchen L, Ziyu Y, Dongkuan Xu

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan Xu

Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data
Haoyan Yang, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Hongxia Jin, Vijay Srinivasan

Efficient Computation of Tucker Decomposition of Correlation-Based Tensors
Binfeng Xu, Grey Ballard, Robert Lyday, Paul Laurienti

Iterative Constringency Optimization: Preclustering Approach to Agent Interactive Data
Binfeng Xu, Nicole Dalzell

Misc

Petting Kobu, the Norway Forest 🐱; Cyberpunk; Digital nomad (someday); Fan of all games by Hidetaka Miyazaki, who motivated me once into indie game devs; Good at Dota2 (once). Photography @500px; Minimalist;