INTRODUCTION |
It is possible to use known search engines such as Google, Bing or Yahoo to programmatically perform a web search. For this you have to provide additional parameters to the specific URL of the provider that contain the search string, and perform an HTTP GET request with this. This data is evaluated by a web application, i.e. a program that runs on the web server, and the results are returned to you as a HTTP response [more...The server program can be written in different programming languages,
are widely used PHP, Moreover some Web search providers make available search services that can be used via a programming interface (API, Application Programming Interface). Although these services are mostly with costs, there is sometimes a limited, but free version for training and development purposes. For example, using the Bing Search API, you can create your own search machine with an individualized layout. Search APIs mostly return results in a special format, namely the JavaScript Object Notation (JSON). With the Python module json it is easy to convert the format into a Python dictionary. But to extract data of your interest, you have first to learn what is a Python dictionary. |
UNDERSTANDING A DICTIONARY |
As the name suggests, a dictionary is a data structure similar to a dictionary book. You can imagine word pairs with words on the left being in a language you already know and ones on the right in a foreign language (we disregard any ambiguities). The example below shows some names of colors from English to German:
(In a real-world dictionary words are arranged alphabetically so that finding a specific word is simplified.) The word on the left is the key and the word on the right is the value. A dictionary thus consists of key-value pairs [more..A key-value-structured data type is named an associative array, map, hash, Hashtable or HashMap]. Both keys and values can have any data type [more...Keys must have an immutable data type. This exludes lists for the key, but allows numbers, strings and tuples]. Your program translates the above colors from German to English. If the input is not in the dictionary, the error is caught and an error message appears. lexicon = {"blau":"blue", "rot":"red", "grün":"green", "gelb":"yellow"} print "All entries:" for key in lexicon: print(key + " -> " + lexicon[key]) while True: farbe = input("color (deutsch)?") if farbe in lexicon: print(farbe + " -> " + lexicon[farbe]) else: print(farbe + " -> " + "(not translatable)") |
MEMO |
A dictionary consists of key-value pairs. In contrast to a list, these pairs are not ordered. In the definition, you use a pair of curly brackets, separate the respective pairs with commas, and key and value with a colon. Important operations:
A dictionary can be iterated through with a for loop for key in dictionary: |
DICTIONARIES ARE EFFICIENT DATA STRUCTURES |
You are right if you object to the thought that paired information can be saved in a list. It would be obvious to save each pair as a short list, all of which would be elements of a parent list. Why then is there a dictionary as a separate data structure? Aarau:5000 Your first task is to convert this text file into a dictionary. In order to do this, first load it in as a string with read() and then split it into individual lines using split("\n") [more...The file can be separated to the rows by default Windows with <CR> <LF> To create the dictionary, separate the key and value in each row once again at the colon and add the new pairs to the (originally empty) dictionary using the bracket notation. Just like before with the colors example, you can now access the postal codes using the bracket notation. file = open("chplz.txt") plzStr = file.read() file.close() pairs = plzStr.split("\n") print(str(len(pairs)) + " pairs loaded") plz = {} for pair in pairs: element = pair.split(":") plz[element[0]] = element[1] while True: town = input("City?") if town in plz: print("The postal code of " + town + " is " + str(plz[town])) else: print("The city " + town + "was not found.") |
MEMO |
It is very easy and quick to access a value for a certain key in a dictionary [more... There, the hash algorithm is used]. |
USING BING FOR YOUR OWN PURPOSES |
Your program uses the Bing search engine to search for websites with a search string entered by the user and to write out the information provided. In order to access the Bing search machine, you need a personal authentication key. To acquire it, proceed as follows: Visit the site https://www.microsoft.com/cognitive-services/en-us/apis and choose "Get started for free." You will be prompted to use your existing Microsoft account or create a new one. In the page titled Microsoft Cognitive Services you choose "APIs" and "Bing Web Search" and click on "Request new trials". Scroll down and select "Search Bing-Free". After confirmation with "Subscribe" you get two key values. Save one of them with copy&paste for further use a a local text file. You can retrieve the keys any time under your Microsoft account. In your program you send a GET request supplemented with the search string. The response from Bing is a string in which information is structured by curly brackets. The formatting is consistent with the JavaScript Object Notation (JSON). Using the method json.load() it can be converted into a nested Python dictionary, that can then be parsed more efficiently. During a test phase, you can analyze the nesting by writing out the appropriate information to the console. You can comment out or remove these lines later. What does Bing find for the search string "Hillary Clinton"? import urllib2 import json def bing_search(query): key = 'xxxxxxxxxxxxxxxxxxxxx' # use your personal key url = 'https://api.cognitive.microsoft.com/bing/v5.0/search?q=' + query urlrequest = urllib2.Request(url) urlrequest.add_header('Ocp-Apim-Subscription-Key', key) responseStr = urllib2.urlopen(urlrequest) response = json.load(responseStr) return response query = input("Enter a search string(AND-connect with +):") results = bing_search(query) #print "results:\n" + str(results) webPages = results['webPages'] print("Number of hits:", webPages["totalEstimatedMatches"]) print("Found URLs:") values = webPages.get('value') for item in values: print(item["displayUrl"]) |
MEMO |
As you can see, a dictionary can in turn contain other dictionaries as values. Thus, hierarchical information structures can be created, similar to XML. The authentication key is used in a additional header entry of your GET request. You can modify the Bing search by additional query parameters. For example if you append "&count=20" to the URL, you get a total of 20 replies. For more information consult the API reference. |
EXERCISES |
|