Evaluating Agent Performance | Boolean & Beyond