arxiv:2505.18125

TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

Published on May 23

· Submitted by

EilamSha on May 26

#1 Paper of the day

Upvote

109

Authors:

Alan Arazi ,

Eilam Shapira ,

Abstract

TabSTAR, a tabular foundation model with semantically target-aware representations, achieves state-of-the-art performance in classification tasks with text features through transfer learning without dataset-specific parameters.

AI-generated summary

While deep learning has achieved remarkable success across many domains, it has historically underperformed on tabular learning tasks, which remain dominated by gradient boosting decision trees (GBDTs). However, recent advancements are paving the way for Tabular Foundation Models, which can leverage real-world knowledge and generalize across diverse datasets, particularly when the data contains free-text. Although incorporating language model capabilities into tabular tasks has been explored, most existing methods utilize static, target-agnostic textual representations, limiting their effectiveness. We introduce TabSTAR: a Foundation Tabular Model with Semantically Target-Aware Representations. TabSTAR is designed to enable transfer learning on tabular data with textual features, with an architecture free of dataset-specific parameters. It unfreezes a pretrained text encoder and takes as input target tokens, which provide the model with the context needed to learn task-specific embeddings. TabSTAR achieves state-of-the-art performance for both medium- and large-sized datasets across known benchmarks of classification tasks with text features, and its pretraining phase exhibits scaling laws in the number of datasets, offering a pathway for further performance improvements.

View arXiv page View PDF Project page Add to collection

Community

EilamSha

Paper author Paper submitter 11 days ago

Project Website: https://eilamshapira.com/TabSTAR/

yjh415

11 days ago

•

edited 11 days ago

Audio overview 😀
Ep 84: TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations
https://youtu.be/AvYZjJmve50

fdaudens

11 days ago

Listen to the audio brief for this paper on Spotify: https://open.spotify.com/episode/6UmDEVeOvXl1xMsXvdH71d?si=pNBMvlKvTHSGORyOi_BU1g

librarian-bot

10 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Representation Learning for Tabular Data: A Comprehensive Survey (2025)
Griffin: Towards a Graph-Centric Relational Database Foundation Model (2025)
[Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning](https://huggingface.co/papers/2505.05237) (2025)
Integrating Structural and Semantic Signals in Text-Attributed Graphs with BiGTex (2025)
Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard (2025)
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs (2025)
TabKAN: Advancing Tabular Data Analysis using Kolmograv-Arnold Network (2025)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ModelContext

6 days ago

47m params? nice, ETA on release? "few days ago" a few days ago

alana89

Paper author 6 days ago

Thanks! We released the research / pretraining code, and the model is released as well, but we intend to add easy support for finetuning. This requires a somewhat bigger code adaptation than expected, and we want the experience to be as smooth as possible. I don't have an exact ETA, but we would love to first release the model to a selected group of testers that should provide feedback. If you feel like it, please feel free to sign in here: https://eilamshapira.com/TabSTAR/