parsing - How to solve, finding two of each link (Beautifulsoup, python) -
im using beautifulsoup4 parse webpage , collect href values using code
#collect links 'new' page pagerequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') soup = beautifulsoup(pagerequest.content, "html.parser") links = soup.select("div.turbolink_scroller a") allproductinfo = soup.find_all("a", class_="name-link") print allproductinfo linkslist1 = [] href in allproductinfo: linkslist1.append(href.get('href')) print(linkslist1)
linkslist1 prints 2 of each link. believe happening taking link title item colour. have tried few things cannot bs parse title link, , have list of 1 of each link instead of two. imagine real simple im missing it. in advance
this code give result without getting duplicate results (also using set() may idea @tarum gupta) changed way crawl
import requests bs4 import beautifulsoup #collect links 'new' page pagerequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') soup = beautifulsoup(pagerequest.content, "html.parser") links = soup.select("div.turbolink_scroller a") # gets divs class of inner-article search name-link class inside h1 tag allproductinfo = soup.select("div.inner-article h1 a.name-link") # print (allproductinfo) linkslist1 = [] href in allproductinfo: linkslist1.append(href.get('href')) print(linkslist1)
Comments
Post a Comment