Python For Beginner: Extracting Phone Number and Email Address From Websites

Photo by Pakata Goh on Unsplash

Have you ever wonder if there is a quicker way to look up specific information on a website? The average human attention span is around eight seconds, so most people will quickly lose interest when reading long paragraphs after paragraphs.

I decide to keep myself productive by learning to code using Python that can make my life just a tiny bit easier. The first thing on my path to becoming a better programmer is to build a simple data extractor program.

I started my coding journey by reading a coding book called “Automate the Boring Stuff with Python” by Al Sweigart. Personally, the book makes it very easy to get started on creating your first program to automate simple tasks. The book first goes over the fundamentals of coding and have instructions on how to install the Python Interpreter on your computer. One of the projects that book wants us to work on is to create a data extractor program — where it grabs only the phone number and email address by copying all the text on a website to the clipboard. The project came from chapter 7 from “Automate the boring stuff with Python” called Phone Number and Email Address Extractor. You can find the book and the project linked here to “Automate the Boring Stuff with Python” by Al Sweigart.

Few fundamentals that you should look into before tackling the project:

Simple Workflow:

First, you will need to figure out what modules to import. In this case, we will need the “re” and “pyperclip” modules. The “re” module provides regular expression matching operations, which I will be using to create a validation rule for phone numbers and email addresses (Link to more information on re). The “pyperclip” module is used for the basic copying and pasting of plain text to the clipboard (Link to more information on pyperclip).

Next, we will need to write a few lines of code to validate that phone numbers and email addresses have the correct output when running the code. To do so, we will need to write a regex expression that can make sure the condition is true. Creating regular expressions can get a little complex, so I suggest learning the basics first. Luckily for us, the book gave us the regular expression statement for validating phone numbers. Still, there are many different ways you can write expression because the regular expression is universal. Here is a link to a site that teaches you the basics of the regular expression.

Brief Summary: See below for the breakdown of regex for validating phone numbers

1. (\d{3}|\(\d{3}\))? — Validating for the area code (must be the three-digit number)

“\d{3}” matches if there are three digits

“|” creates a conditional statement where depending on the validation would either be “(\d{3}” or “{\d{3}\)”

2. (\s|-|\.) — The delimiter (-)

“\s|” matches a single space character then output “-”

“|\.” matches a period character then output “-”

3. (\d{3}) — Validates for three digits

“\d{3}” matches if there are three digits

4. (\s|-|\.)? — The delimiter (-)

5. (\d{4}) — Validate the last Four digits

“\d{4}” matches if there are four digits

6. (\s*(ext|x|ext.)\s*(\d{2,5}))? — Validates if there is an extension (e.g. x95, x956. X9556, or x95556).

The expression checks to see if there is an “x,” then the “\d{2,5}” must match the digits that are 2 to 5 numbers long.

7. re.VERBOSE — This is a VERBOSE format flag that comes from the re module

The VERBOSE flag is used at the end of the statement to make the regular expression more organized. It is what allows us to write this code line by line

Brief Summary: The regex for validating email addresses, which is broken up by the below:

1. ([a-zA-Z0–9_\-\.]) — Pretty self-explanatory, this looking for usernames

“a-zA-Z0–9” matches for any lower case letters (a — z), as well as capital letters from “A — Z” and accepts numbers (0–9).

“[…]” accepts any one character within the square bracket.

“\-\.” Also can contain dashes or periods in the username

2. @ + — Simply looking only for text with the @ sign

3. [a-zA-Z0–9_\-\.] — The same concept as step 1, but looking for the domain name

4. (\.[a-zA-Z]{2,5}) — looks for the .web extensions (e.g. .gov, .edu, .org, .com, .net, .biz, .info)

“a — zA –Z” matches for lower case letters (a — z), as well as capital letters from “A — Z” and accepts numbers (0–9).

“{2,5}” look for 2–5 characters just in case. Realistically “{2,4}” would be enough.

The above code will find the matches from the clipboard when you copy a text onto your computer. We will need two conditional loops for the phone number and email address. The conditional statements utilize the “For” and “If” loops for the program to logically know what it needs to do.

1. Started with declaring a variable called “text” that contains the paperclip function

Brief Summary: If you do not have a condition in case an error occurs, then make sure you add on at the end. To make sure your code is functioning well, it is a good idea to add a friendly error message. In this case, I added one in case the text I copy does not grab any phone numbers or email addresses.

1. if len(matches) > 0:


print(‘Copied to clipboard:’)


2. else:

print(‘No phone numbers or emails found.’)

Note: Always set up a backup statement for testing purposes when running scenarios.

Final Result:

As stated in the book, we will be using the No Starch Press website as a test for the program. Copy the whole webpage (ctrl+a) and then copy the text to the clipboard (ctrl+c) then run the program. See the final output below.


The regular expression for the phone numbers provided by the book only works for the United State phone format. I have tested this with a random UK number and it did not output the correct phone number.



It was a short but fun project overall as I have learned a lot compared to the four years of college that I spent in a classroom. The biggest struggle that I faced when writing out the code was figuring out the regular expression for validating email addresses. I was able to figure out by testing and doing some research on regular expression with examples. In the end, this simple code can be reused and will make your life easier when looking for specific information on a website.

Screenshot of the No Starch Press Contact Page:
This is the final output when running the code.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store