Header Ads

Header ADS

Python Wikipedia API


Python Wikipedia Module.

Python have a package named as wikipedia using which you can scrap the wikipedia data and use lot on your python projects.

you can do lots of things using this package and In this article we discuss some common functions and attributes of wikipedia Module.

wikipedia module is a third- party package means you have to install it before importing.

pip install wikipedia








So lets get started :



1. Extracting Summary of an Article :

using wikipedia.summary( ) method you can get the summary of any wikipedia article .

Code to Illustrate :
       			 
import wikipedia

# Extracting symmary of an Article

summ = wikipedia.summary("Laptop")
print(summ)
 

  Output  

A laptop (also laptop computer), often called a notebook, is a small, portable personal computer (PC) with a "clamshell" form factor, typically having a thin LCD or LED computer screen mounted on the inside of the upper lid of the clamshell and an alphanumeric keyboard on the inside of the lower lid. The clamshell is opened up to use the computer. Laptops are folded shut for transportation, and thus are suitable for mobile use. Its name comes from lap, as it was deemed to be placed on a person's lap when being used. Although originally there was a distinction between laptops and notebooks (the former being bigger and heavier than the latter), as of 2014, there is often no longer any difference. Today, laptops are commonly used in a variety of settings, such as at work, in education, for playing games, web browsing, for personal multimedia, and general home computer use. Laptops combine all the input/output components and capabilities of a desktop computer, including the display screen, speakers, a keyboard, data storage device, sometimes an optical disc drive, pointing devices (such as a touchpad or trackpad), with a operating system, a processor and memory into a single unit. Most modern laptops feature integrated webcams and built-in microphones, while many also have touchscreens. Laptops can be powered either from an internal battery or by an external power supply from an AC adapter. Hardware specifications, such as the processor speed and memory capacity, significantly vary between different types, makes, models and price points......


It will print the whole summary but if you want to get only its few sentences the you pass a attribute sentences = number in summary method.
after adding sentences option the summary method only return the data according to the number of sentence instead of whole summary.


Code to Illustrate :
       			 
import wikipedia

# Extracting symmary of an Article

summ = wikipedia.summary("Laptop",sentences=1)
print(summ)
 

  Output  

A laptop (also laptop computer), often called a notebook, is a small, portable personal computer (PC) with a "clamshell" form factor, typically having a thin LCD or LED computer screen mounted on the inside of the upper lid of the clamshell and an alphanumeric keyboard on the inside of the lower lid.




2. Extracting Search title of an Article :

wikipedia search() method returns you a list of all related search according to the data passes to it .
you can also use result = any number and suggestion = Boolean (True / False ) with search method.


Code to Illustrate :
       			 
import wikipedia

summ = wikipedia.search("Apple")
print(summ)
 

  Output  

['Apple', 'Apple Inc.', 'Apple Silicon', 'Apple (disambiguation)', 'Apple Park', 'Apple A13', 'IPhone', 'Apple Music', 'Apple TV', 'Apple A12']





3.  Changing Language :

By using set_lang( ) method you can change the language of your scrap data and then fetch it using summary method.


Code to Illustrate :
       			 
import wikipedia

wikipedia.set_lang('hi')
# hi for Hindi
data = wikipedia.summary("Laptop",sentences=1)
print(data)
 

  Output  

साक्षात भारत में डिजाइन किया गया एक ऍण्ड्रॉइड प्लेटफॉर्म आधारित टैबलेट संगणन यन्त्र है। यह तकनीकी विभाजन को पाटने हेतु एक सस्ते डिवाइस के तौर पर डिजाइन किया गया है। यह उपकरण सूचना तथा संचार तकनालॉजी के द्वारा शिक्षा के राष्ट्रीय मिशन के तहत विकसित किया गया है जिसका उद्देश्य उपमहाद्वीप के २५,००० कॉलेज तथा ४०० विश्वविद्यालयों को एक मौजूदा साक्षात पोर्टल के द्वारा एक ई-लर्निंग प्रोग्राम से जोड़ना है। यह १५०० रुपये ($३५ अमेरिकी) के मूल्य के लक्ष्य के साथ घोषित किया गया है। इससे पहले २००९ में भी $ १० के इस तरह का एक उपकरण बनाने की बात हुयी थी जो कि अन्ततः एक पैन ड्राइव जैसा स्टोरेज डिवाइस निकला।

Languages supported


       			 
print(wikipedia.languages() )
 



4.  get search suggestion :

the suggest( ) does a intelligent guess on what you are searching & return its result.


Code to Illustrate :
       			 
import wikipedia

#  using wrong spelling of laptops
data = wikipedia.suggest('lapteps')
print(data)
 

  Output  

laptop

it returns the right search suggestion if the spelling is wrong.



5.  Access full page Data :

The wikipedia API also gives us full access to the Wikipedia page, with the help of which we can access the title, URL, content, images, links of the complete page. In order to access the page you need to load the page first as shown below:


       			 
import wikipedia

data = wikipedia.page("Computer")
# lets print Variable data
print(data)
 

  Output  

<WikipediaPage 'Computer'>


5.1 Access Title of article :

You can fetch the title of Any article by using title attribute of wikipediaPage class.

Code to Illustrate :
       			 
import wikipedia

data = wikipedia.page("Computer")

print('Title is ',data.title)
 

  Output  

Title is Computer




5.2 Access URL of article :

You can fetch the URL of Any article by using url attribute of wikipediaPage class.

Code to Illustrate :
       			 
import wikipedia

data = wikipedia.page("Computer")

print('URL is ',data.url)
 

  Output  

URL is https://en.wikipedia.org/wiki/Computer



5.3 Access LINKS of article :

Similarly, we can get the links that Wikipedia used as a reference from different websites or research, etc.

Code to Illustrate :
       			 
import wikipedia

data = wikipedia.page("Computer")

print('used links are',data.links)
 
It will returns lots of links and in our case it returned 921 links.
we will showing only few of them for example.

  Output  

['16-bit', '2-in-1 PC', '32-bit', '3D computer graphics', '3D computer graphics software', '4-bit', '64-bit', '8-bit', '86-DOS', 'ARM architecture', 'ARMv7', 'ARMv8-A', 'ARPANET', 'Abacus', 'Aberdeen Proving Ground', 'Abstract machine', 'Abū Rayhān al-Bīrūnī', 'Accounting software', 'Ada (programming language)', 'Advanced Micro Devices', 'Air conditioner', 'Alan Turing', 'Algorithm', 'All-in-one PC', 'American Chemical Society', 'American National Standards Institute', 'Amoeba (operating system)', 'Amsterdam' ,.......................]


5.4 Access Images used in article :

The wikipedia API only gives us the links of all the Images used in a Article.
you can fetch it using images option and if you want to fetch a specific image then you can index the images option as images[ 0 ] , this will return the URL of first image used in the Article.

Code to Illustrate :
       			 
import wikipedia

data = wikipedia.page("Computer")
# getting only first image URL
print(data.images[0])

# for all images url use 
# print(data.images)
 

  Output  

https://upload.wikimedia.org/wikipedia/commons/a/aa/099-tpm3-sk.jpg





6.  Download Images from Wikipedia's Article :

Code to Illustrate :
       			 
import wikipedia
import urllib.request

# first load the page
load_page = wikipedia.page('computer')
# get the link of first image of article
image_1 = load_page.images[0]

urllib.request.urlretrieve(image_1,r'C:\Users\alex\Desktop\image.jpg')
 
OUTPUT :






For more related information on wikipedia API click here .





Thank you for visiting here.
if you have any doubts or errors please post a comment. we will reply you as soon as possible. 


No comments

Powered by Blogger.