Seq2Seq Models for Symbolic Expression Recovery from Taylor Series
Neural Symbolic Regression from Truncated Taylor Series
This project explores the task of recovering symbolic mathematical expressions from truncated Taylor series using a sequence-to-sequence deep learning model.
Motivation
Taylor series provide a way to approximate functions locally, but the reverse problem — reconstructing the original symbolic function given only a truncated expansion — is far from trivial. This inverse task has important applications in areas such as symbolic regression, mathematical physics, and automated reasoning.
I became interested in exploring this challenge because I wanted to see whether it was feasible to recover the exact function starting only from its Taylor expansion. This scenario occurs more often than one might think: in many scientific and engineering domains, we often have access only to truncated series approximations of a function, or to indirect representations that capture its local behavior. However, in order to conduct deeper analysis or reasoning, the original closed-form function is required. Reconstructing it from limited information is therefore both a difficult and highly relevant problem.
Approach
- Synthetic dataset generation:Using Sympy, a diverse set of functions was randomly generated from a small grammar of base functions (
x
,x²
,sin(x)
,cos(x)
,exp(x)
) combined with algebraic operators. Each function was expanded around 0 up to 6th order, producing pairs (function, Taylor expansion). - Tokenization and vocabulary:Expressions were tokenized and converted into integer sequences, with special tokens for
<PAD>
,<SOS>
,<EOS>
, and<UNK>
. - Model:Implemented a seq2seq model with LSTM encoder-decoder architecture in PyTorch.
- Encoder: processes the expansion sequence.
- Decoder: generates the corresponding symbolic function.
- Training:
- Dataset: ~130k examples.
- Device: trained on Apple Silicon (MPS backend).
- Optimizer: Adam with learning rate scheduling.
- Evaluation: Achieved 92.2% exact match and 95.1% token-level accuracy on a held-out evaluation set.
Results
The trained model shows that it is indeed possible to reconstruct non-trivial symbolic functions starting solely from their truncated Taylor expansions. Despite the apparent loss of information that occurs when moving from a full closed-form expression to a local approximation, the model is able to recover the original structure with a surprisingly high degree of accuracy.
This suggests that neural networks, when trained on carefully constructed datasets, are capable of capturing and generalizing aspects of symbolic regression — a task traditionally considered outside the scope of purely data-driven methods.
The results obtained are promising, but they also open several directions for further improvement. In particular, integrating mechanisms such as self-attention could provide the model with a more flexible way of handling long-range dependencies within symbolic sequences, and thus potentially enhance both robustness and accuracy in the reconstruction process.
Repository structure:
se2seq_model.ipynb
: end-to-end notebook with dataset creation, model training, and evaluation.seq2seq_model.pth
: trained model weights.dataset.json
: Containing the generated pairs (function, Taylor Expansion)