Home >  > 自然语言处理nlp好课

自然语言处理nlp好课

0

一个中国小姐姐讲python自然语言处理的课程,真的讲得太好了,强烈推荐。

一、概述
数据科学应该掌握的三种技能:

数据的两种格式:

二、英语
profanity n. 亵圣; 对神灵的亵渎; (亵圣的) 诅咒语

corpus n. (书面或口语的) 文集,文献,汇编; 语料库;

Dreyfus model 德雷福斯模型

potty train 对(幼儿)作坐便训练 n.(幼儿的) 便盆 adj. 发疯的; 癫狂的; 喜爱; 对…痴迷

三、代码

#抓取p标签内容
# Scrapes transcript data from scrapsfromtheloft.com
def url_to_transcript(url):
    '''Returns transcript data specifically from scrapsfromtheloft.com.'''
    page = requests.get(url).text
    soup = BeautifulSoup(page, "lxml")
    text = [p.text for p in soup.find(class_="post-content").find_all('p')]
    print(url)
    return text

#生成字典
data_combined = {key: [combine_text(value)] for (key, value) in data.items()}

#对文字进行处理
import re
import string

def clean_text_round1(text):
    '''Make text lowercase, remove text in square brackets, remove punctuation and remove words containing numbers.'''
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text

round1 = lambda x: clean_text_round1(x)


# Apply a second round of cleaning
def clean_text_round2(text):
    '''Get rid of some additional punctuation and non-sensical text that was missed the first time around.'''
    text = re.sub('[‘’“”…]', '', text)
    text = re.sub('\n', '', text)
    return text

round2 = lambda x: clean_text_round2(x)

https://www.youtube.com/watch?v=xvqsFTUsOmc
https://github.com/adashofdata/nlp-in-python-tutorial

四、获得数据的地方

本文暂无标签

发表评论

*

*