<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="http://crittersik.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="http://crittersik.github.io/" rel="alternate" type="text/html" /><updated>2025-02-17T12:05:21+00:00</updated><id>http://crittersik.github.io/feed.xml</id><title type="html">Ola</title><subtitle>Machine Learning Engineer</subtitle><entry><title type="html">Contextual Bandits</title><link href="http://crittersik.github.io/Contextual-Bandits/" rel="alternate" type="text/html" title="Contextual Bandits" /><published>2025-01-12T00:00:00+00:00</published><updated>2025-01-12T00:00:00+00:00</updated><id>http://crittersik.github.io/Contextual-Bandits</id><content type="html" xml:base="http://crittersik.github.io/Contextual-Bandits/"><![CDATA[<p>Contextual bandits are a type of machine learning problem that sits between multi-armed bandits and reinforcement learning. They are used in online decision-making scenarios, like recommending products. Why are they cool?</p>

<h2 id="multi-armed-bandits-mab">Multi-Armed Bandits (MAB)</h2>

<p>Imagine you have three products (A, B, and C) to suggest to users, but you don’t know which one they prefer. You need to balance:</p>
<ul>
  <li><strong>Exploration</strong>: showing different products to learn user preferences</li>
  <li>and <strong>Exploitation</strong>: recommending the product that seems to perform best so far.</li>
</ul>

<p><em>For example, if Product A gets the most clicks from users after some trials, you’ll show it more often, but occasionally test B and C to ensure you’re not missing out on a better option.</em></p>

<p>In a multi-armed bandit problem, you don’t have any extra information (or context) about the arms—just the reward signals.</p>

<h2 id="contextual-bandits">Contextual Bandits</h2>
<p>Now imagine you’re still recommending products, but you receive some context or information. For example:</p>

<p>A user visits your website, and you know their age, location, and past behavior.
Before showing them a product, you know the time of day and the device they’re using.</p>

<p>In a contextual bandit problem, you use the experience to improve future decisions in similar contexts (for example, if younger users tend to click on Product A and older users on Product B, the system learns to recommend products based on the user’s profile).</p>

<h2 id="whats-the-difference-between-full-reinforcement-learning-rl-and-contextual-bandits">What’s the difference between Full Reinforcement Learning (RL) and Contextual Bandits?</h2>

<p>Contextual Bandits: Only the immediate reward matters. There’s no consideration of long-term consequences.
Full RL: Actions affect future states and rewards, and there’s a long-term strategy involved.</p>

<h2 id="action">Action!</h2>
<p>Let’s see a comparison between regular and contextual TS in action!</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

# Simulation Parameters
N_PRODUCTS = 5
N_USERS = 10000
AGE_GROUPS = [20, 30, 40, 50]  # User age groups

# Below we are defining the ground truth, that we will later compare against in simulation outcomes
# True conversion rates for products in each age group
true_conversion_rates = {
    20: [0.1, 0.3, 0.2, 0.05, 0.1],
    30: [0.2, 0.1, 0.4, 0.15, 0.05],
    40: [0.05, 0.2, 0.1, 0.4, 0.25],
    50: [0.1, 0.15, 0.3, 0.25, 0.2],
}

# Generate users with random ages
np.random.seed(42)
user_ages = np.random.choice(AGE_GROUPS, size=N_USERS)

# Function to simulate user interaction and generate rewards
def get_reward(age, product):
    return np.random.rand() &lt; true_conversion_rates[age][product]

def thompson_sampling_non_contextual(n_users):
    rewards = []
    alpha = np.ones(N_PRODUCTS)
    beta_params = np.ones(N_PRODUCTS)
    
    for _ in range(n_users):
        sampled_theta = np.random.beta(alpha, beta_params)
        selected_product = np.argmax(sampled_theta)
        reward = get_reward(np.random.choice(AGE_GROUPS), selected_product)
        rewards.append(reward)
        if reward:
            alpha[selected_product] += 1
        else:
            beta_params[selected_product] += 1

    return np.cumsum(rewards)
</code></pre></div></div>

<p>Introducing contextual sampling basically means introducing extra parameters that would help evaluate the right alpha and beta, while the rest of the logic remains the same.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def thompson_sampling_contextual(n_users, user_ages):
    rewards = []
    alpha = {age: np.ones(N_PRODUCTS) for age in AGE_GROUPS}
    beta_params = {age: np.ones(N_PRODUCTS) for age in AGE_GROUPS}
    
    for user_idx in range(n_users):
        user_age = user_ages[user_idx]
        sampled_theta = np.random.beta(alpha[user_age], beta_params[user_age])
        selected_product = np.argmax(sampled_theta)
        reward = get_reward(user_age, selected_product)
        rewards.append(reward)
        if reward:
            alpha[user_age][selected_product] += 1
        else:
            beta_params[user_age][selected_product] += 1

    return np.cumsum(rewards)
</code></pre></div></div>

<p>Simulate and Compare</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>non_contextual_rewards = thompson_sampling_non_contextual(N_USERS)
contextual_rewards = thompson_sampling_contextual(N_USERS, user_ages)
</code></pre></div></div>

<p>And this illustrates how much you can gain in rewards:</p>

<p><img src="_posts/images/ts_comparison.png" alt="Cumulative Rewards Comparison" /></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Contextual bandits are a type of machine learning problem that sits between multi-armed bandits and reinforcement learning. They are used in online decision-making scenarios, like recommending products. Why are they cool?]]></summary></entry><entry><title type="html">Data Science Glossary</title><link href="http://crittersik.github.io/DS-glossary/" rel="alternate" type="text/html" title="Data Science Glossary" /><published>2019-08-01T00:00:00+00:00</published><updated>2019-08-01T00:00:00+00:00</updated><id>http://crittersik.github.io/DS-glossary</id><content type="html" xml:base="http://crittersik.github.io/DS-glossary/"><![CDATA[<p>[EDIT] Take a look at this data science glossary instead:
https://www.datacamp.com/blog/data-science-glossary</p>]]></content><author><name></name></author><summary type="html"><![CDATA[[EDIT] Take a look at this data science glossary instead: https://www.datacamp.com/blog/data-science-glossary]]></summary></entry><entry><title type="html">Test post</title><link href="http://crittersik.github.io/Hello-World/" rel="alternate" type="text/html" title="Test post" /><published>2019-07-31T00:00:00+00:00</published><updated>2019-07-31T00:00:00+00:00</updated><id>http://crittersik.github.io/Hello-World</id><content type="html" xml:base="http://crittersik.github.io/Hello-World/"><![CDATA[<p>This is a test post.</p>

<p>Link to <a href="https://github.com/crittersik">my github</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[This is a test post.]]></summary></entry></feed>