top of page
Group.png

What Models Learn Is What We Give Them

  • Nidhi Agrawal
  • 7 days ago
  • 3 min read

Over time, while building and iterating on machine learning systems, one lesson keeps repeating itself: model performance is often constrained less by the algorithm and more by how the data is represented.

Feature engineering is the work of turning raw data into signals that reflect real-world behavior. When done well, it allows models to learn patterns that would otherwise remain hidden. In practice, we’ve consistently seen that thoughtful feature design improves accuracy more reliably than switching algorithms or fine-tuning hyperparameters.

In this post, I’ll walk through two feature engineering techniques, applied to a single real-world example; to show how small representation changes can significantly improve model performance:

  1. Cyclic date features using sine and cosine

  2. Moving average features

The Running Example: Daily Sales Forecasting

Consider a common problem: predicting daily sales for a retail business.

A typical raw dataset might include:

  • date

  • daily_sales

At first glance, this seems sufficient. But very quickly, limitations emerge:

  • Sales follow seasonal cycles

  • Daily values are noisy

  • Short-term patterns matter as much as individual data points

Raw fields alone don’t capture these behaviors. Feature engineering helps make them explicit.

1. Encoding Cyclic Date Features (Sin & Cos)

Why raw date values fall short : Time-based variables such as months, days, or hours are cyclic by nature. However, when encoded as integers, models interpret them as linear.

For example:

  • January = 1

  • December = 12

Numerically, they appear far apart, even though they are adjacent in time. This introduces artificial discontinuities that models struggle to learn from.

Making seasonality learnable: To preserve cyclic structure, we encode time features using sine and cosine transformations:

df["month_sin"] = np.sin(2 np.pi df["month"] / 12)

df["month_cos"] = np.cos(2 np.pi df["month"] / 12)

This maps the months onto a circular space, allowing the model to understand continuity across year boundaries.


Impact on the sales model

After applying cyclic encoding:

  • Seasonal sales peaks became easier to learn

  • Predictions around year-end improved

  • Seasonal errors reduced without changing the model itself

By aligning the data representation with real-world time cycles, seasonality became a signal instead of noise.


2. Moving Average Features

Capturing short-term trends

Daily sales data often contains sharp fluctuations caused by promotions, holidays, or random variation. Relying on raw values forces the model to learn from noisy signals.

To provide short-term context, we introduced a moving average feature:

df["sales_7day_avg"] = df["sales"].rolling(window=7).mean()

This feature summarizes recent behavior and highlights trends.

Why it improved accuracy

Including a moving average allowed the model to:

  • Understand momentum in sales

  • Reduce sensitivity to one-day anomalies

  • Produce smoother and more stable forecasts

Tree-based models such as Random Forest and XGBoost benefited particularly from this added context, as it allowed them to split on trend-aware signals rather than individual noisy observations.

Why These Features Make a Difference

Each of these features addresses a specific weakness in raw time-series data:

Feature Type

Captures

What the Model Learns

Cyclic Encoding

Seasonality

Repeating time patterns

Moving Average

Short-term trends

Momentum and recent behavior

When used together:

  • The model converged faster

  • Seasonal boundary errors decreased

  • Overall prediction accuracy improved without increasing model complexity

This reinforces a recurring insight from real-world modeling work:better features often outperform more complex models.

 

Final Thoughts

Feature engineering is not about adding more columns - it’s about making real-world behavior visible to the model by:

  • Encoding time correctly, and

  • Smoothing short-term noise,

we help models learn patterns that raw data alone cannot express.

The question I consistently return to is:

What does this data represent in the real world - and how can I help the model see that?

Because in the end, what models learn is exactly what we give them.

bottom of page