- home
- Advanced Search
1 Research products, page 1 of 1
Loading
- Publication . Article . Preprint . 2013 . Embargo End Date: 01 Jan 2013Open AccessAuthors:Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;Publisher: arXivCountry: NetherlandsProject: NWO | Modeling and Learning fro... (8686), EC | COMPLACS (270327), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)
This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. Comment: 13 pages, 6 figures
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
1 Research products, page 1 of 1
Loading
- Publication . Article . Preprint . 2013 . Embargo End Date: 01 Jan 2013Open AccessAuthors:Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;Publisher: arXivCountry: NetherlandsProject: NWO | Modeling and Learning fro... (8686), EC | COMPLACS (270327), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)
This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. Comment: 13 pages, 6 figures
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.