Home >  > Selenium如何打开新标签页及使用Cookie

Selenium如何打开新标签页及使用Cookie

0

最近弄一个项目,需要使用Selenium,由于目标网站在国外,访问速度非常慢,而且由于使用Selenium触发了它的验证码机制,光登陆就非常麻烦。最后使用Selenium的加载cookie功能,终于解决了登陆问题。

附最终的代码:

一、保存Cookie

from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
import pickle

def save_cookies(driver,location):
    pickle.dump(driver.get_cookies(),open(location,"wb"))


print("启动浏览器,打开SimiliarWeb登录界面")
#用webdriver启动谷歌浏览器
driver = webdriver.Chrome(executable_path = "C:\\Users\xxx\AppData\Local\Google\Chrome\Application\chromedriver.exe")
#打开目标页面
driver.get('https://pro.similarweb.com/#/website/worldwide-overview/snailtoday.com/*/999/3m?webSource=Total')
author = 你的用户名
passowrd = 你的密码

#自动填入登录用户名
driver.find_element_by_xpath("./*//input[@name='UserName']").send_keys(author)
#自动填入登录密码
driver.find_element_by_xpath("./*//input[@name='Password']").send_keys(passowrd)
#自动点击登录按钮进行登录
driver.find_element_by_xpath("./*//button[@class='form__submit']").click()
print("登陆成功")
# 休息150秒
time.sleep(150)
# 保存cookie
save_cookies(driver,"H:\py_project\similiarweb\cookies.txt")

二、载入cookie及切换标签页

from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
import pickle
from bs4 import BeautifulSoup

def load_cookies(driver,location,url=None):
    cookies = pickle.load(open(location,"rb"))
    driver.delete_all_cookies()
    url = "https://pro.similarweb.com/#/website/worldwide-overview/snailtoday.com/*/999/3m?webSource=Total" if url is None else url
    driver.get(url)
    for cookie in cookies:
        driver.add_cookie(cookie)

print("启动浏览器,打开SimiliarWeb登录界面")
#用webdriver启动谷歌浏览器
driver = webdriver.Chrome(executable_path = "C:\\Users\xxx\AppData\Local\Google\Chrome\Application\chromedriver.exe")
load_cookies(driver,"H:\py_project\similiarweb\cookies.txt")
#打开目标 页面
driver.get('https://pro.similarweb.com/#/website/worldwide-overview/snailtoday.com/*/999/3m?webSource=Total')
time.sleep(30)

html=driver.page_source
soup=BeautifulSoup(html,'lxml')
visitors = soup.find_all('div', class_='big-text u-blueMediumMedium')[0].text
print(visitors)

#打开新的标签页
js = 'window.open("https://pro.similarweb.com/#/website/worldwide-overview/baidu.com/*/999/3m?webSource=Total");'
driver.execute_script(js)
time.sleep(30)
handles = driver.window_handles
driver.switch_to_window(handles[2])
html=driver.page_source
soup=BeautifulSoup(html,'lxml')
visitors = soup.find_all('div', class_='big-text u-blueMediumMedium')[0].text
print(visitors)

print("发布文章成功")

三、填坑
由于自己是第一次使用beautiful soup,这里面有许多坑。

1.多个结果中取值

visitors = soup.find_all('div', class_='big-text u-blueMediumMedium')

使用上面的代码,获得的是两个结果,如果要取第一个结果,需要在后面加上“[0]”

2.取文本
上面的代码加上0之后,运行的结果是:

<div class="big-text u-blueMediumMedium" title="16,100">16,103</div>

如果要取文本,则要加上

visitors = soup.find_all('div', class_='big-text u-blueMediumMedium')[0].text

这里又是一个大坑,可能我最开始没有加“[0]”,直接加上“.text”不对,后来加上“[0]”之后,由于网站加载速度的问题,又显示IndexError: list index out of range,总之,被这个小问题一通折腾。

3.关于切换标签页
由于我的chrome默认一打开就有两个标签页,所以

driver.switch_to_window(handles[2])

这段代码这儿需要注意列表中的数值。

四、最终的代码:

from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
import pickle
from bs4 import BeautifulSoup

def load_cookies(driver,location,url=None):
    cookies = pickle.load(open(location,"rb"))
    driver.delete_all_cookies()
    url = "https://pro.similarweb.com/#/website/worldwide-overview/snailtoday.com/*/999/3m?webSource=Total" if url is None else url
    driver.get(url)
    for cookie in cookies:
        driver.add_cookie(cookie)

print("启动浏览器,打开SimiliarWeb登录界面")
#用webdriver启动谷歌浏览器
driver = webdriver.Chrome(executable_path = "C:\\Users\xxx\AppData\Local\Google\Chrome\Application\chromedriver.exe")
load_cookies(driver,"H:\py_project\similiarweb\cookies.txt")


for domain in open("domains.txt"):
    print(domain)
    url = 'https://pro.similarweb.com/#/website/worldwide-overview/{}/*/999/3m?webSource=Total'.format(domain)
    driver.get(url)
    time.sleep(30)
    html = driver.page_source
    soup = BeautifulSoup(html, 'lxml')
    visitors = soup.find_all('div', class_='big-text u-blueMediumMedium')[0].text
    print(visitors)
    time.sleep(5)

原载:蜗牛博客
网址:http://www.snailtoday.com
尊重版权,转载时务必以链接形式注明作者和原始出处及本声明。

本文暂无标签

发表评论

*

*