Python: search for a string inside a given url, save to a txt the link it points to -
i'm new python , trying write script looks coupons regarding online courses. flow following:
# read .txt list of courses , create vector containing of them # loop through vector and, each course, check if there's discount online (e.i udemycoupon.discountsglobal.com) # if there is, print link pointing
creating vector file easy, else seems give me problems. taking example above website, have create, each course, string needed url. tried search web urllib (and urllib2) couldn't (error 403: forbidden). looked other answers none of them seem work.
could please tell me how write particular part of script (considering example "complete python programming course 2016: code using python 3")?
# have string called "course" containing "complete python programming course 2016: code using python 3" # substitute spaces inside "course" "+" , # utf-8 symbol code (: should %3a) # string link = "http://udemycoupon.discountsglobal.com/?s=complete+python+programming+course+2016%3a+code+using+python+3" can created. # if text "100% off free complete python programming..." # or "98% off complete python programming..." # or "97% off complete python programming..." # or every combination of upper/lower case (i think converting lowercase might convenient) contained in "link" url # save link pointing to variable "coupon_link" # print "coupon_link" new .txt file
this tried:
# -*- coding: utf-8 -*- import urllib.request open('courses_list.txt') f: courseslist = f.readlines() # going modified courseslistunquoted = courseslist # keep original strings length = len(courseslist) # create links http://udemycoupon.discountsglobal.com/ in range(0, length): courseslist[i] = courseslist[i].replace("\n","") courseslist[i] = courseslist[i].replace(" ","+").lower() courseslist[i] = courseslist[i].replace(":","%a3") courseslist[i] = courseslist[i].replace("#","%23") courseslist[i] = courseslist[i].replace("!","%21") courseslist[i] = courseslist[i].replace("/","%2f") # scrape http://udemycoupon.discountsglobal.com/ addressbeginning = "http://udemycoupon.discountsglobal.com/?s=" in range(0, length): link = addressbeginning + courseslist[i] urllib.request.urlopen(link) response: htmlcode = response.read() ...
but doesn't seem work. error "no module named request". how can access text on webpage?
thank help.
it might possible website trying access has limited access these resources browsers or specific clients. change user-agent setting them specific browser , send request again.
Comments
Post a Comment