Rafael Nadal is perhaps the most famous left-handed player on the courts right now. But the competition form right-hands is fierce: Roger Federer and Novak Djokovic, with no doubt ones of the most frightening opponents, are right-handed.
In this little exercice we will use Google Sheets and the add-on function IMPORTFROMWEB() to see who is more willing to win a Grand Chelem final…
First, we’ll have to search the web and see how we can find the pieces of information we need.
- ESPN displays a list of the Men’s Grand Slam Title Winners and runners-ups
- For the rest, there is Wikipedia:
Let’s open a new Spreadsheet. For once, we will not dread the blank sheet!
Let’s start by importing the ESPN table.
Obviously we could use copy/paste, but what if we want our data being updated for future tournaments?
The solution is to monitor the page by using the IMPORTFROMWEB() function:
In A1 we will put the URL of the ESPN page
In A2, B2, B3, B4, the XPaths of each column of the table
If you’re not comfortable with XPaths, don’t worry. You will learn it quickly as long as you understand basic HTML. Have a look at this article:
[not(@class="stathead")] has been added in order to remove the first row of the table that was misaligning the cells (try and see!)
Now in A4, we will run our function
…And let the magic happen!
If you’re all good the table from ESPN should appear in your Google Sheets
Link it to Wikipedia
We want to retrieve the hand style for each player. Which means that we will scrape the wikipedia page of each player. That’s almost 200 pages
For this purpose, let’s open a new sheet in the same spreadsheet. This sheet will be a list of all the players we have in the previous table. To do that use the following formula
In Column B, we will generate the Wikipedia URL for each player.
Urls in Wikipedia are not rocket science. In our case, it is just
In B2 type:
|B2||=”https://en.wikipedia.org/wiki/”&SUBSTITUTE(A1:A,” “, “_”)|
And extend the formula for each player
Now that we have the urls, we just need the XPath corresponding to the hand information.
Geese, that one looks more complicated.
It looks for the element that has the text “Plays” and returns its neighbour.
These relative XPaths may look cumbersome, but they are also the most robust ones: Even if the source code changes slightly, it’s very probable that the information we are looking for will keep being near “Plays”
Take a look at this XPaths cheatsheet for a better understanding
Next, let IMPORTFROMWEB do its work
Let’s create a function for the last tournament
Then extend for each player and wait! That will scrape about 200 pages
You’ll see that Wikipedia provides something like
“Right-handed (two-handed backhand)”, so we will have to do a bit of text manipulation in column D so it displays only the first part of the text (“Right-handed”)
Now extend the formula to all players.
Joining the two tables
Let’s go back to our first sheet. We’ll add the hand style to our main table.
To join tables let’s use the VLOOKUP function which we will apply to each final, in the E column for the winners and in the F column for the runner-up.
For example in row 5
|E5||=VLOOKUP(D5,Sheet2!$A$2:$C, 3, false)|
|F5||=VLOOKUP(D5,Sheet2!$A$2:$C, 3, false)|
You should now be able to see for each final if a left handed player usually beat a right-handed or the contrary
Extend to all finals
So what? Who win? Left-handed or right-handed?
In the last step, we will show in column G:
- TRUE if a final has been won by a right-handed player against a left-handed
- FALSE if a final has been won by a left-handed player against a right-handed
Then we just have to count the TRUE value vs. the FALSE values
Conclusion: Right-handed players have won more Grand-Chelem finals against left-handed players than the opposite.
On a side note, beware that when you look carefully the data, you’ll find errors like #URL_RETURNS_404 or #XPATH_RETURNS_NULL. That’s because in some cases Wikipedia didn’t find a page with this URL or that the page didn’t provide the information about the hand style
Find the whole tutorial in this shared Spreadsheet