top of page
Writer's pictureAbhijeet Srivastav

Extracting data from Wikipedia using Python?

Python is beautiful and powerful language and we can use to do any kind of task which you can imagine.


In this blog we are going to extract data regarding a keyword from wikipedia.


I need to mention one more thing, we are not going to use web scraping methodology for this purpose and also this going to be a command line interface script.


I will soon upload a GUI app for the same using tkinter module.


Enough talk! Lets get started.


First of all we are going to install a module named wikipedia itself...interesting, what do you say?


pip install wikipedia

Make sure that your python interpreter is added to path in your system.

For Linux system you can use sudo command for the same purpose.

If pip don't work use pip3.


Now open Idle as it is just an small script and type the following code.

import wikipedia
print(wikipedia.summary('Python programming language'))

This gone extract few lines from the top of this wikipedia page.


We can even specify the number of sentences to extract:

print(wikipedia.summary('Python programming language', sentences=2))

We can also get pages related to an keyword:

result = wikipedia.search('Artificial Intelligence')

This is going to return a list of pages associated with the keyword.

In [3]: result = wikipedia.search("Artificial Intelligence")
In [4]: print(result)

['Artificial Intelligence', 'Artificial neural network', , 'Types of artificial intelligence']

Now suppose we want to see the page associated with the first keyword then we can do it by this code:


page = wikipedia.page(result[0])

We can extract title of the page by using title method:

extracted_title = page.title

We can even get categories of that wikipedia page:

categories = page.categories

We can get references in the page:

refer = page.refrences

We can remove all the links in the page:

content = page.content

Summarized code:






3 views0 comments

Comments


bottom of page