Cache Vs Race Condition

We need to rewrite the API that race condition can't occur.

  • Race Condition : it occurs when 2 or threads can access shared data and they try to change it at the same time. Because of *thread scheduling algorithm can swap between threads at any time, you don't know the order in which the threads will attempt to access the shared data.
public class CachedObject<TValue>
{
    public TValue Value { get; private set; }
    public DateTime DateCached { get; private set; }

    public CachedObject(TValue value, DateTime dateCached)
    {
        Value = value;
        DateCached = dateCached;
    }
}
public class Cache<TKey, TValue>
{
    private TimeSpan _expiry;
    private Dictionary<TKey, CachedObject<TValue>> _store =
        new Dictionary<TKey, CachedObject<TValue>>();

    public Cache(TimeSpan expiry)
    {
        _expiry = expiry;
    }

    public TValue Get(TKey key)
    {
        CachedObject<TValue> found;

        // If the object is in the cache, check the date.
        if (_store.TryGetValue(key, out found) &&
            found.DateCached + _expiry < DateTime.Now)
        {
            return found.Value;
        }
        else
        {
            return default(TValue);
        }
    }

    public void Put(TKey key, TValue value)
    {
        _store.Add(key, new CachedObject<TValue>(value, DateTime.Now));
    }
}

The user need to check first by using Get before Acting Put (prerequisite), and they have to synchronize around the 2 calls. Here, they run the risk of creating a race condition.

for this, there are many trials may occur to show race condition issue one of them when 2 users can use Get at same time, if finding none => both race to fetch the value and Put in the cash.

To prove that a race condition can't occur, a callback (Func w/ lock)

public class Cache<TKey, TValue>
{
    private TimeSpan _expiry;
    private Dictionary<TKey, CachedObject<TValue>> _store =
        new Dictionary<TKey, CachedObject<TValue>>();

    public Cache(TimeSpan expiry)
    {
        _expiry = expiry;
    }

    public TValue Get(TKey key, Func<TKey, TValue> fetch)
    {

        // bottle neck design
        lock (_store)
        {
              //locking on the cache while fetching the value
            CachedObject<TValue> found;

            // If the object is not in the cache, or it is expired,
            // fetch it and add it to the cache.
            DateTime now = DateTime.Now;
            if (!_store.TryGetValue(key, out found) ||
                found.DateCached + _expiry >= now)
            {
                found = new CachedObject<TValue>(fetch(key), now);
               _store.Add(key, found);
            }
            return found.Value;
        }
    }
}

This version take a call backback to fetch the value if it is not found. Synchronize is now built into the Get method. no need to synchronize for the caller. We can prove the correctness of the code w/o seeing the caller.

There is an issue w/ above code, it's Lock. in preventing a race condition on 2 thread getting the same value, we have inadvertently blocked 2 threads getting different values and tightly locked. All user will line up behind one another, whether they are after the same data or not.

If we lock the Key, when 2 clients access the same key will line up while other clients access different keys will not.

We need to lock something identity, in case lock the key a client can create new instance of key having the same value and that should qualify as the same key.

  • the value of the key is used to find an object in dictionary, that object doesn't have identity. So we move the lock there.
public class CachedObject<TValue>
{
    private TValue _value;
    private bool _fetched = false;
    public DateTime DateCached { get; private set; }

    public CachedObject(DateTime dateCached)
    {
        DateCached = dateCached;
    }

    public TValue Value
    {
        get { return _value; }
        set { _value = value; _fetched = true; }
    }

    public bool Fetched
    {
        get { return _fetched; }
    }
}
public class Cache<TKey, TValue>
{
    private TimeSpan _expiry;
    private Dictionary<TKey, CachedObject<TValue>> _store =
        new Dictionary<TKey, CachedObject<TValue>>();

    public Cache(TimeSpan expiry)
    {
        _expiry = expiry;
    }

    public TValue Get(TKey key, Func<TKey, TValue> fetch)
    {
        CachedObject<TValue> found;

       // Inside the lock
      //it checks to see whether the value has already fetched. 
      //If not, this thread is the first in line, so it fetches the value. 
     //But if it has already been fetched, then the thread   
    //was not the first in line and 
    //can just use the fetch object.
        lock (_store)
        {
            // If the object is not in the cache, or it is expired,
            // fetch it and add it to the cache.
            DateTime now = DateTime.Now;
            if (_store.TryGetValue(key, out found) &&
                found.DateCached + _expiry > now)
            {
                return found.Value;
            }

            found = new CachedObject<TValue>(now);
            _store.Add(key, found);
        }

        lock (found)
        {
            if (!found.Fetched)
                found.Value = fetch(key);
            return found.Value;
        }
    }
}

The Get method locks first on the cash. This protects the data structure, and ensures that we get just one instance of cachedObject. Once it has that, it locks on the found cashedObject so that clients after the same key line up.

References: