An exploration of champion win rates by tier. Hypothesis: certain high skill champions like Nidalee win more when played by Platinum players compared to Silver players. Conversely certain cheese champions like Amumu win more in Silver players, where they are mostly against lower skill players.
Data comes from http://na.op.gg/statistics/champion/ and represents approximately 15 million games played in the month ending September 12 2016.
By Nelson Minar [email protected]
In[1]:
import pandas, collections, numpy, seabornfrom IPython.core.display import display, HTMLimport matplotlib.pyplot as plt%matplotlib inlineimport seaborn as sns;sns.set_palette('deep')
In[2]:
tier_names = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond', 'Master', 'Challenger']right_align = [{'selector': 'td', 'props': [('text-align', 'right')]}] # for DataFrame.style
Load and prepare the data¶
In[3]:
# Create a bunch of DataFrames, one per CSV filetiers = {}for tier in tier_names: tiers[tier] = pandas.read_csv('data/Champion win rates by tier - %s.tsv' % tier, sep='\t', header=0, names=['N', 'X', 'champion', 'winrate', 'games', 'kda', 'cs', 'gold'], thousands = ',', index_col = 2) # Remove unneeded columns del tiers[tier]['X'] del tiers[tier]['N'] # Parse a couple of columns down to simple numbers tiers[tier]['winrate'] = tiers[tier]['winrate'].apply(lambda s: float(s[:-1])) tiers[tier]['kda'] = tiers[tier]['kda'].apply(lambda s: float(s[:-2]))# Smoosh all the DataFrames into a single Paneldata = pandas.Panel(tiers, items = tier_names)
Sample data for Nidalee¶
In[4]:
data.xs('Nidalee')
Out[4]:
Bronze | Silver | Gold | Platinum | Diamond | Master | Challenger | |
---|---|---|---|---|---|---|---|
winrate | 41.45 | 44.59 | 48.14 | 51.00 | 53.96 | 61.45 | 57.21 |
games | 17466.00 | 58051.00 | 52953.00 | 39436.00 | 12505.00 | 664.00 | 208.00 |
kda | 2.04 | 2.38 | 2.70 | 2.92 | 3.13 | 3.59 | 3.97 |
cs | 107.19 | 122.93 | 133.71 | 138.80 | 142.22 | 144.76 | 146.22 |
gold | 11625.00 | 12232.00 | 12708.00 | 12806.00 | 12767.00 | 12992.00 | 12950.00 |
Average statistics by tier¶
In[5]:
d = []for t, df in data.iteritems(): d.append((df['winrate'].mean(), df['games'].mean(), df['kda'].mean(), df['cs'].mean(), df['gold'].mean()))averages = pandas.DataFrame(d, index = data.items, columns=('winrate', 'games', 'kda', 'cs', 'gold'))(averages.style .format({'cs': "{:.0f}", 'games': '{:,.0f}', 'gold': '{:,.0f}', 'kda': '{:.2f}', 'winrate': "{:.2f}%"}) .set_table_styles(right_align))
Out[5]:
winrate | games | kda | cs | gold | |
---|---|---|---|---|---|
Bronze | 45.93% | 31,248 | 2.27 | 126 | 11,774 |
Silver | 48.21% | 89,143 | 2.44 | 142 | 12,058 |
Gold | 49.42% | 61,990 | 2.53 | 151 | 12,205 |
Platinum | 49.90% | 33,342 | 2.56 | 156 | 12,116 |
Diamond | 50.05% | 7,429 | 2.55 | 156 | 11,714 |
Master | 54.45% | 321 | 2.70 | 156 | 11,494 |
Challenger | 55.38% | 94 | 3.11 | 152 | 11,361 |
Win rates by tier, alternate calculation¶
It seems odd that the total win rate across all data is < 50%. Perhaps they are including games that didn't complete?
In[6]:
d = []for t, df in data.iteritems(): winsPerChamp = df.games * df.winrate / 100 d.append((100 * winsPerChamp.sum() / df.games.sum(), winsPerChamp.sum(), df.games.sum()))tier_stats = pandas.DataFrame(d, index=data.items, columns=('win rate for tier', 'wins in tier', 'games in tier'))(tier_stats.style .format({'games in tier': '{:,.0f}', 'win rate for tier': '{:.2f}%', 'wins in tier': '{:,.0f}'}) .set_table_styles(right_align))
Out[6]:
win rate for tier | wins in tier | games in tier | |
---|---|---|---|
Bronze | 46.39% | 1,913,643 | 4,124,704 |
Silver | 48.50% | 5,707,026 | 11,766,902 |
Gold | 49.53% | 4,052,918 | 8,182,712 |
Platinum | 50.01% | 2,200,874 | 4,401,080 |
Diamond | 50.37% | 493,929 | 980,689 |
Master | 53.46% | 20,766 | 38,843 |
Challenger | 54.48% | 4,101 | 7,528 |
Champion win rates from Silver to Diamond¶
Which champions do the most better in the hands of skilled players? Which champions' win rates fall off in higher tiers?
It turns out Nidalee has the most improvement for player skill; she goes from 44.59% win rate for Silver players to 53.96% in Diamond, a gain of 9.37%. Conversely Amumu loses 4.10% win rate.
The column "Silver to Diamond" is simply the difference in win rates in the two tiers. "Max Spread" is the difference between maximum and minimum win rate. It's uesful for champs like Blitzcrank that are strongest in Gold (+2.09%), not Platinum (+0.87%).
In[7]:
spreads = {}for name in data.major_axis: champ_data = data.major_xs(name) # Consider only Silver -> Diamond data reduced = champ_data.transpose()[1:-2] spreads[name] = ( reduced.winrate[-1] - reduced.winrate[0], max(reduced.winrate) - min(reduced.winrate), data.Silver.loc[name].winrate, data.Gold.loc[name].winrate, data.Platinum.loc[name].winrate, data.Diamond.loc[name].winrate, )win_rates = pandas.DataFrame.from_records(spreads, index=('Silver to Diamond', 'Max Spread', 'Silver', 'Gold', 'Platinum', 'Diamond')).transpose()win_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)df_disp = pandas.concat([win_rates.head(10), win_rates.tail(10)])display(df_disp.style .format({'Silver to Diamond': '{:+.2f}%', 'Max Spread': '{:.2f}%', 'Silver': '{:.2f}%', 'Gold': '{:.2f}%', 'Platinum': '{:.2f}%', 'Diamond': '{:.2f}%'}) .set_table_styles(right_align) .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5, subset=['Silver', 'Gold', 'Platinum', 'Diamond']))
Silver to Diamond | Max Spread | Silver | Gold | Platinum | Diamond | |
---|---|---|---|---|---|---|
Nidalee | +9.37% | 9.37% | 44.59% | 48.14% | 51.00% | 53.96% |
Pantheon | +7.50% | 7.50% | 48.01% | 51.33% | 52.84% | 55.51% |
Riven | +6.07% | 6.07% | 47.05% | 49.26% | 50.63% | 53.12% |
Twisted Fate | +5.99% | 5.99% | 46.75% | 49.52% | 51.70% | 52.74% |
Aurelion Sol | +5.77% | 5.77% | 49.09% | 50.19% | 52.31% | 54.86% |
Rengar | +5.57% | 5.57% | 45.68% | 47.25% | 49.28% | 51.25% |
Ryze | +5.53% | 5.53% | 41.48% | 42.92% | 43.95% | 47.01% |
Kindred | +5.22% | 5.22% | 45.64% | 48.64% | 50.41% | 50.86% |
Urgot | +4.98% | 4.98% | 45.08% | 48.46% | 49.00% | 50.06% |
Evelynn | +4.79% | 4.79% | 46.65% | 49.29% | 50.62% | 51.44% |
Nasus | -1.13% | 1.94% | 47.69% | 48.50% | 46.91% | 46.56% |
Ziggs | -1.18% | 2.72% | 49.37% | 50.91% | 50.36% | 48.19% |
Brand | -1.48% | 1.48% | 51.66% | 51.64% | 51.31% | 50.18% |
Kalista | -1.95% | 3.31% | 42.63% | 43.99% | 43.57% | 40.68% |
Sion | -1.96% | 1.96% | 52.06% | 51.47% | 52.03% | 50.10% |
Aatrox | -2.23% | 4.21% | 46.78% | 47.98% | 48.76% | 44.55% |
Dr. Mundo | -2.36% | 2.97% | 46.82% | 47.43% | 47.15% | 44.46% |
Garen | -2.56% | 2.83% | 49.27% | 49.54% | 49.20% | 46.71% |
Yorick | -3.88% | 3.88% | 46.24% | 45.16% | 43.84% | 42.36% |
Amumu | -4.10% | 4.10% | 52.85% | 52.52% | 51.91% | 48.75% |
In[8]:
g = sns.distplot(win_rates['Silver to Diamond'], bins=15)g.set(title='Win rate differences from Silver to Diamond')
Out[8]:
[<matplotlib.text.Text at 0x7f1bd598ca58>]
Champion popularity by tier¶
How popular are champions at various tiers? Which champions get more popular at higher tiers?
It turns out Janna has the most increase in usage in higher tiers. She's picked in only 0.79% of Silver examples (93,510 games out of 11.8M) but she's picked 2.92% of the time in Diamond examples (28,644 games out of 1M). Conversely Leona has the biggest drop in usage, from 1.63% to 0.63%.
Note that the raw numbers reported in Silver/Gold/Platinum/Diamond are not strictly pick rate, although they are mostly correlated. Janna represents 0.79% of all the Silver data we have. The report is sorted by the column "Silver to Diamond", the difference in pick rates from Silver to Diamond.
In[9]:
pick_rates = 100 * data.minor_xs('games') / data.minor_xs('games').sum()del pick_rates['Bronze']del pick_rates['Master']del pick_rates['Challenger']pick_rates.insert(0, 'Silver to Diamond', pick_rates.Diamond - pick_rates.Silver)pick_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)df_disp = pandas.concat([pick_rates.head(10), pick_rates.tail(10)])(df_disp.style .format({'Silver': '{:,.2f}', 'Gold': '{:,.2f}', 'Platinum': '{:,.2f}', 'Diamond': '{:,.2f}', 'Silver to Diamond': '{:+.2f}'}) .set_table_styles(right_align) .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5, subset=['Silver', 'Gold', 'Platinum', 'Diamond']))
Out[9]:
Silver to Diamond | Silver | Gold | Platinum | Diamond | |
---|---|---|---|---|---|
Janna | +2.13 | 0.79 | 1.28 | 1.79 | 2.92 |
Lucian | +1.38 | 2.29 | 2.81 | 3.27 | 3.67 |
Jhin | +1.27 | 2.01 | 2.46 | 2.96 | 3.28 |
Ezreal | +0.99 | 1.93 | 2.50 | 2.72 | 2.93 |
Bard | +0.95 | 0.77 | 0.97 | 1.20 | 1.72 |
Graves | +0.93 | 0.97 | 1.36 | 1.73 | 1.90 |
Nidalee | +0.78 | 0.49 | 0.65 | 0.90 | 1.28 |
Karma | +0.71 | 0.79 | 0.91 | 1.05 | 1.50 |
Rek'Sai | +0.71 | 0.44 | 0.56 | 0.74 | 1.15 |
Elise | +0.56 | 0.45 | 0.56 | 0.70 | 1.01 |
Annie | -0.67 | 1.31 | 1.09 | 0.82 | 0.64 |
Miss Fortune | -0.69 | 1.15 | 0.81 | 0.64 | 0.47 |
Garen | -0.69 | 0.88 | 0.51 | 0.28 | 0.19 |
Xin Zhao | -0.71 | 0.89 | 0.54 | 0.34 | 0.17 |
Master Yi | -0.73 | 1.09 | 0.83 | 0.66 | 0.37 |
Vayne | -0.78 | 1.87 | 1.89 | 1.58 | 1.09 |
Amumu | -0.91 | 1.25 | 0.98 | 0.68 | 0.34 |
Lux | -0.95 | 1.51 | 1.16 | 0.89 | 0.56 |
Jinx | -0.97 | 2.25 | 2.04 | 1.87 | 1.28 |
Leona | -1.00 | 1.63 | 1.11 | 0.79 | 0.63 |
In[10]:
g = sns.distplot(pick_rates['Silver to Diamond'], bins=15)g.set(title='Pick rate differences from Silver to Diamond')
Out[10]:
[<matplotlib.text.Text at 0x7f1bd48d9940>]
Scatterplot of Platinum win rate vs pick rate¶
Are high win rate champs more popular in platinum? Not particularly...
In[11]:
wr_vs_pick_platinum = pandas.concat((win_rates['Platinum'], pick_rates['Platinum']), axis=1)wr_vs_pick_platinum.columns = ('Win Rate', 'Pick Rate')g = sns.jointplot(x='Win Rate', y='Pick Rate', ylim=(0,3.5), xlim=(40,60), data=wr_vs_pick_platinum, kind="scatter")
Scatterplot of Win Rate improvement vs Pick Rate change¶
Are champions that have a bigger Silver-to-Diamond win rate change also likely to have a higher pick rate Silver-to-Diamond? If there were a correlation you'd expect the dots below to fall on the line x=y
. They don't really, but there is a correlation
In[12]:
wr_vs_pick_sd = pandas.concat((win_rates['Silver to Diamond'], pick_rates['Silver to Diamond']), axis=1)wr_vs_pick_sd.columns = ('Win Rate', 'Pick Rate')g = sns.jointplot(x='Win Rate', y='Pick Rate', data=wr_vs_pick_sd, kind="scatter")