What Models Learn Is What We Give Them

Nidhi Agrawal
7 days ago
3 min read

Over time, while building and iterating on machine learning systems, one lesson keeps repeating itself: model performance is often constrained less by the algorithm and more by how the data is represented.

Feature engineering is the work of turning raw data into signals that reflect real-world behavior. When done well, it allows models to learn patterns that would otherwise remain hidden. In practice, we’ve consistently seen that thoughtful feature design improves accuracy more reliably than switching algorithms or fine-tuning hyperparameters.

In this post, I’ll walk through two feature engineering techniques, applied to a single real-world example; to show how small representation changes can significantly improve model performance:

Cyclic date features using sine and cosine
Moving average features

The Running Example: Daily Sales Forecasting

Consider a common problem: predicting daily sales for a retail business.

A typical raw dataset might include:

date
daily_sales

At first glance, this seems sufficient. But very quickly, limitations emerge:

Sales follow seasonal cycles
Daily values are noisy
Short-term patterns matter as much as individual data points

Raw fields alone don’t capture these behaviors. Feature engineering helps make them explicit.

1. Encoding Cyclic Date Features (Sin & Cos)

Why raw date values fall short : Time-based variables such as months, days, or hours are cyclic by nature. However, when encoded as integers, models interpret them as linear.

For example:

January = 1
December = 12

Numerically, they appear far apart, even though they are adjacent in time. This introduces artificial discontinuities that models struggle to learn from.

Making seasonality learnable: To preserve cyclic structure, we encode time features using sine and cosine transformations:

df["month_sin"] = np.sin(2 np.pi df["month"] / 12)

df["month_cos"] = np.cos(2 np.pi df["month"] / 12)

This maps the months onto a circular space, allowing the model to understand continuity across year boundaries.

Impact on the sales model

After applying cyclic encoding:

Seasonal sales peaks became easier to learn
Predictions around year-end improved
Seasonal errors reduced without changing the model itself

By aligning the data representation with real-world time cycles, seasonality became a signal instead of noise.

2. Moving Average Features

Capturing short-term trends

Daily sales data often contains sharp fluctuations caused by promotions, holidays, or random variation. Relying on raw values forces the model to learn from noisy signals.

To provide short-term context, we introduced a moving average feature:

df["sales_7day_avg"] = df["sales"].rolling(window=7).mean()

This feature summarizes recent behavior and highlights trends.

Why it improved accuracy

Including a moving average allowed the model to:

Understand momentum in sales
Reduce sensitivity to one-day anomalies
Produce smoother and more stable forecasts

Tree-based models such as Random Forest and XGBoost benefited particularly from this added context, as it allowed them to split on trend-aware signals rather than individual noisy observations.

Why These Features Make a Difference

Each of these features addresses a specific weakness in raw time-series data:

Feature Type	Captures	What the Model Learns
Cyclic Encoding	Seasonality	Repeating time patterns
Moving Average	Short-term trends	Momentum and recent behavior

When used together:

The model converged faster
Seasonal boundary errors decreased
Overall prediction accuracy improved without increasing model complexity

This reinforces a recurring insight from real-world modeling work:better features often outperform more complex models.

Final Thoughts

Feature engineering is not about adding more columns - it’s about making real-world behavior visible to the model by:

Encoding time correctly, and
Smoothing short-term noise,

we help models learn patterns that raw data alone cannot express.

The question I consistently return to is:

What does this data represent in the real world - and how can I help the model see that?

Because in the end, what models learn is exactly what we give them.