Binfeng (Bill) Xu
I am a Senior Research Engineer at Samsung Research America (SRA) specialized in Post-training large language models, and building on-device agentic sytems.
I completed my MS. degree from NYU in 2022, majoring in Computer Science and Data Science,
during which I briefly worked with Alfredo Canziani
and Yann LeCun on
policy learning for autonomous driving.
Prior to this, I earned BS. degree from WFU in 2020, doubling Computer Science and
Statistics, advised by Grey Ballard on efficient Machine Learning (Tucker
Decomposition) for neuro fMRI and by Paúl Pauca on multiple
Computer Vision projects. In my spare time, I play ML competitions on Kaggle and
rank top 1% globally titled 'Competition Master'.
For the last 10 years, I have been hands-on with a broad range of AI and ML research and
applications. I enjoy training large models on massive data and compute, and to build products out of ideas end-to-end.
|
|
Recent Work & Interests
LLM Post-training. This year I've been working and researching on the whole spectrum of post-training algorithms and infra: from data generation and labeling to SFT, Preference RLHF (eg. PPO, DPO),
RLVR (eg. GRPO), Efficient Fine-tuning (eg. LoRA), Inference Acceleration(eg. SSM, Flash Attention, Speculative Decoding), Quantization and Compilation (eg. AWQ, llama.cpp), Knowledge distillation, Model Mergeing, etc.
I've also spent tons of time on product and architecting,
and have delivered several core projects for Samsung on-deive RAG, KBQA and agentic systems.
LLM Reasoning and Machine Intelligence. Generally bipartites into:
-
Conceptual learning: Modeling multimodal world patterns with (very) long-context autoregressive transformers. eg. LWM.
Recently, Deepseek embarks a successful trail (GRPO) to self-supervise LLM on verified rewards, at a cost of increased test-time compute.
-
Generative reasoning: Symbolic Learning models (eg. AlphaGo) shows the strength at (relatively) small action space, which inspires recent LLM research to
annotate data with Tree Search sampling (eg. MCTS) and process supervision. I'm currently more interested in
GFlowNet in stochastic prediction of wild world space.
Tool-augmented LLM Agents: Training large models for autonomous actions with optimized reasoning and planning chains.
- ReWOO
:
Eliminate stacking redundancy in ALM systems by decoupling LLM reasoning from observations.
- Gentopia
:
A collaborative agentic framework to build hierarchical agents through config, integrating model specialization, evaluation, sharing and inheritance.
[demo].
|
Papers
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Binfeng Xu, Xukun Liu, Hua Shen, Zeyu H, Yuhan L, Murong Y, Zhiyuan P, Yuchen
L, Ziyu Y, Dongkuan Xu
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language
Models
Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, Dongkuan
Xu
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data
Haoyan Yang, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Hongxia Jin, Vijay Srinivasan
Efficient Computation of Tucker Decomposition of Correlation-Based Tensors
Binfeng Xu, Grey Ballard, Robert Lyday, Paul Laurienti
Iterative Constringency Optimization: Preclustering Approach to Agent
Interactive Data
Binfeng Xu, Nicole Dalzell
|
Misc
Petting Kobu,
the Norway Forest 🐱; Cyberpunk; Digital nomad (someday);
Fan of all games by Hidetaka
Miyazaki, who motivated me once into indie game devs; Good at Dota2 (once).
Photography @500px; Minimalist;
|
|
|