Typical reinforcement learning process

 Machine learning algorithms are typically divided into three main categories:

  1. Supervised Learning
    • Classification
    • Regression
  2. Unsupervised Learning
    • Clustering
  3. Reinforcement Learning (RL)

In this article, you’ll learn the fundamentals of Reinforcement Learning, how it works in real life, and how to implement it in R using practical examples.


Table of Contents

  1. Reinforcement learning – real-life example
  2. Typical reinforcement learning process
  3. Core RL concepts (States, Actions, Rewards, Policy)
  4. Divide and Rule – breaking the RL process
  5. Implementing Reinforcement Learning in R
  6. Using the MDPtoolbox package
  7. Using the ReinforcementLearning GitHub package
  8. Handling changing environments
  9. Complete R code
  10. Conclusion

Reinforcement Learning – A Real-Life Example

Think about how students learn:

  • A teacher explains a concept
  • Students practice similar problems
  • They receive feedback (right/wrong)
  • Over time, performance improves

This is exactly how Reinforcement Learning works.

Instead of learning from labeled datasets, the model learns by:

✅ Interacting with the environment
✅ Making decisions
✅ Receiving rewards or penalties
✅ Improving decisions over time

This approach is ideal for:

  • Game playing
  • Robotics
  • Navigation tasks
  • Adaptive systems

Typical Reinforcement Learning Process

The learning agent:

  1. Observes the current state
  2. Chooses an action
  3. Receives a reward or penalty
  4. Moves to a new state
  5. Updates its strategy to maximize total reward

This trial-and-error learning style mimics human behavior.


Core Elements of Reinforcement Learning

Every RL system consists of:

ElementDescription

States (S)

Different positions/environment conditions

Actions (A)

Possible decisions in each state

Rewards (R)

Feedback for actions

Policy (π)

Strategy guiding actions

Value (V)

Expected long-term reward

Goal:

Find the optimal policy π* that maximizes value V.


Divide and Rule – Breaking the RL Process

Before implementation, define:

✅ Allowed actions
✅ State transitions
✅ Rewards and penalties
✅ Stopping conditions


Toy Example – Grid Navigation

The agent has to move from Start to Exit in a grid.

Actions:

  • UP
  • DOWN
  • LEFT
  • RIGHT

Rules:

  • Every step → small penalty
  • Pit → large penalty
  • Exit → big reward

Reinforcement Learning in R – Using MDPtoolbox

Step 1 – Install and Load the Package

# install.packages("MDPtoolbox")
library(MDPtoolbox)


Step 2 – Define Action Matrices

up = matrix(c(
1,0,0,0,
0.7,0.2,0.1,0,
0,0.1,0.2,0.7,
0,0,0,1
), nrow=4, byrow=TRUE)

down = matrix(c(
0.3,0.7,0,0,
0,0.9,0.1,0,
0,0.1,0.9,0,
0,0,0.7,0.3
), nrow=4, byrow=TRUE)

left = matrix(c(
0.9,0.1,0,0,
0.1,0.9,0,0,
0,0.7,0.2,0.1,
0,0,0.1,0.9
), nrow=4, byrow=TRUE)

right = matrix(c(
0.9,0.1,0,0,
0.1,0.2,0.7,0,
0,0,0.9,0.1,
0,0,0.1,0.9
), nrow=4, byrow=TRUE)

Actions = list(up=up, down=down, left=left, right=right)


Step 3 – Define Rewards

Rewards = matrix(c(
-1,-1,-1,-1,
-1,-1,-1,-1,
-1,-1,-1,-1,
10,10,10,10
), nrow=4, byrow=TRUE)


Step 4 – Solve Using Policy Iteration

solver = mdp_policy_iteration(P=Actions, R=Rewards, discount=0.1)


View Results

solver$policy
names(Actions)[solver$policy]

solver$V
solver$iter
solver$time

Expected Output:

Optimal path like:
down → right → up → up


Using the GitHub ReinforcementLearning Package

Install and Load

# install.packages("devtools")
library(devtools)
install_github("nproellochs/ReinforcementLearning")
library(ReinforcementLearning)


Use Pre-Built Gridworld

states <- c("s1", "s2", "s3", "s4")
actions <- c("up", "down", "left", "right")

sequences <- sampleExperience(
N=1000,
env=gridworldEnvironment,
states=states,
actions=actions
)

solver_rl <- ReinforcementLearning(
sequences,
s="State",
a="Action",
r="Reward",
s_new="NextState"
)

solver_rl$Policy
solver_rl$Reward


Adapting to Changing Environments (Tic-Tac-Toe Example)

data("tictactoe")

model_tic_tac <- ReinforcementLearning(
tictactoe,
s="State",
a="Action",
r="Reward",
s_new="NextState",
iter=1
)

model_tic_tac$Policy
model_tic_tac$Reward


Complete Code

Your full original code block remains unchanged and can be directly reused as-is (great for GitHub publishing).


Why Reinforcement Learning Matters

Reinforcement Learning is behind breakthroughs like:

  • Google AlphaGo
  • Robotics locomotion
  • Autonomous driving
  • Game AI

Unlike traditional ML, RL allows machines to learn behavior, not just patterns.


Conclusion

Reinforcement Learning:

✅ Enables machines to learn by experience
✅ Mimics human learning
✅ Works even when labeled data is unavailable
✅ Powers modern AI systems

Though still evolving, RL is becoming a core pillar of AI consulting, automation, and adaptive systems.

At Perceptive Analytics, our mission is “to enable businesses to unlock value in data.” For two decades, we’ve supported 100+ organizations worldwide in building high-impact analytics systems. Our offerings span power bi consulting company and advanced analytics consultants, helping organizations turn raw data into meaningful, decision-ready insights. We would love to talk to you. Do reach out to us.

Comments

Popular posts from this blog

Implementing Log-linear Regression in R

Keep Customers. Protect Revenue.

True Parallelism in R Using the parallel Package