Menu
Join our Mailing List

Name:

Email:

Basic PHP Web Scraping Script Tutorial

Now that we have a basic idea on what web scraping is, let's get into some very simple scripts to do it. As mentioned before I'm going to assuming you have php and curl enabled on your server or desktop. So assuming you have those things installed lets get into the basics of php web scraping. I'll be posting a small program and then walking through what we have done.

 

Lets start with the most basic of scraping scripts

Whole script -

So this is exactlly what the code inside our .php file is going to look like on the server. Of course minus the line numbers.

1. <?php
2. $url = 'http://www.oooff.com';
3. $output = file_get_contents($url);
4. echo $output;
5. ?>

Script Explanation -

So what have we done here?

I'm not going to cover line 1 and 5 as we already know they just let Apache know that the code within them is to be processed as PHP.

Line 2.
$url = 'http://www.oooff.com';
As you know in PHP the $ symbol declares a variable or "holder of things". So in this we're assigning the root url to Oooff.com to a variable so that we don't have to use the whole url string each time. We can just use $url in it's place.

Line 3.
3. $output = file_get_contents($url);
Here again we're declaring the variable $output on the fly and and then calling one of the internal functions to PHP call file_get_contents(). This function is going to go to a url or file and pull all the data that's held in it. For example if you had a file called 'domains.txt' in the same directory as this script is held we'd do something like this $output = file_get_contents('domains.txt'); this would pull all the data from that file and load it into the variable $output to be used in your script. So we can do the same thing with domains, so after this line we'll have all the HTML from the homepage of Ooof.com held in the variable $output.

Line 4.
echo $output;
Very this is just going to print whatever is held in our variable to the screen. Pretty basic.

Trying things out -

ok so now that we have a solid understanding of what these lines of code are doing lets copy or type these into a file on your server where you run PHP files. Once you've created your file navigate on your browser to wherever that file is located. So if you have the file located in a directory/folder called phpfiles in the root directory on your local machine. You'd go to http://localhost/phpfiles/phpfile.php assuming you named your file phpfiles.php.

Click here to see what your scraped php result should look like!

Download the file here

Other things to try -

Now, lets try a couple of things to make sure you have it down.

1. Try and get the data from http://endhousepayments.com

2.Try calling and echoing the page 2 times and see what happens.

Conclusion -

And there you have it we've made the most basic scraping script there is. But now you have the idea of how we get data from the internet to work on in our basic PHP program.

In the next tutorial I'll show you how to take that data and do some basic processing on it.

Next: Basic Data Scraping Using Curl and PHP

 

Back