parsing - How to solve, finding two of each link (Beautifulsoup, python) -

im using beautifulsoup4 parse webpage , collect href values using code

    #collect links 'new' page pagerequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') soup = beautifulsoup(pagerequest.content, "html.parser") links = soup.select("div.turbolink_scroller a")  allproductinfo = soup.find_all("a", class_="name-link") print allproductinfo  linkslist1 = [] href in allproductinfo:     linkslist1.append(href.get('href'))  print(linkslist1)

linkslist1 prints 2 of each link. believe happening taking link title item colour. have tried few things cannot bs parse title link, , have list of 1 of each link instead of two. imagine real simple im missing it. in advance

this code give result without getting duplicate results (also using set() may idea @tarum gupta) changed way crawl

import requests  bs4 import beautifulsoup  #collect links 'new' page pagerequest = requests.get('http://www.supremenewyork.com/shop/all/shirts') soup = beautifulsoup(pagerequest.content, "html.parser") links = soup.select("div.turbolink_scroller a")  # gets divs class of inner-article search name-link class inside h1 tag allproductinfo = soup.select("div.inner-article h1 a.name-link") # print (allproductinfo)  linkslist1 = [] href in allproductinfo:     linkslist1.append(href.get('href'))  print(linkslist1)

Search This Blog

Alcombright

parsing - How to solve, finding two of each link (Beautifulsoup, python) -

Comments

Post a Comment

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

c# SetCompatibleTextRenderingDefault must be called before the first -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -