NLTK库: 数据集3-分类与标注语料(Categorized and Tagged Corpora)

news2025/5/14 13:32:23

NLTK库: 数据集3-分类与标注语料(Categorized and Tagged Corpora)

1.二分类语料

主要是电影语料,和情绪(积极消极、主观客观)有关,有以下2个语料:

1.1 movie_reviews: IMDb 影评

IMDb(Internet Movie Database)是一个广泛使用的电影数据库,提供电影、电视剧等的评分和用户评论。

  • 数据量

正/负评价标签,共2000个,正负评价各有1000

[‘neg/cv000_29416.txt’, ‘neg/cv001_19502.txt’, ‘neg/cv002_17424.txt’, ‘neg/cv003_12683.txt’, ‘neg/cv004_12641.txt’, ‘neg/cv005_29357.txt’, ‘neg/cv006_17022.txt’, ‘neg/cv007_4992.txt’, ‘neg/cv008_29326.txt’, ‘neg/cv009_29417.txt’, …]

[…, ‘pos/cv992_11962.txt’, ‘pos/cv993_29737.txt’, ‘pos/cv994_12270.txt’, ‘pos/cv995_21821.txt’, ‘pos/cv996_11592.txt’, ‘pos/cv997_5046.txt’, ‘pos/cv998_14111.txt’, ‘pos/cv999_13106.txt’]

  • 标签

二分类问题,[‘neg’, ‘pos’]

  • 评价内容

第1个negative review (neg/cv000_29416):

plot : 
two teen couples go to a church party , drink and then drive . 
they get into an accident . 
one of the guys dies , but his girlfriend continues to see him in her life , and has nightmares . 
what's the deal ? 
watch the movie and " sorta " find out . . . 

critique : 
a mind-fuck movie for the teen generation that touches on a very cool idea , but presents it in a very bad package . 
which is what makes this review an even harder one to write , 
since i generally applaud films which attempt to break the mold , 
mess with your head and such ( lost highway & memento ) , 
but there are good and bad ways of making all types of films , and these folks just didn't snag this one correctly . 
they seem to have taken this pretty neat concept , but executed it terribly . 
so what are the problems with the movie ? 
well , its main problem is that it's simply too jumbled . 
it starts off " normal " but then downshifts into this " fantasy " world in which you , as an audience member , have no idea what's going on . 
there are dreams , there are characters coming back from the dead , there are others who look like the dead , 
there are strange apparitions , there are disappearances , there are a looooot of chase scenes , 
there are tons of weird things that happen , and most of it is simply not explained . 
now i personally don't mind trying to unravel a film every now and then , but when all it does is give me the same clue over and over again , 
i get kind of fed up after a while , which is this film's biggest problem . 
it's obviously got this big secret to hide , but it seems to want to hide it completely until its final five minutes . 
and do they make things entertaining , thrilling or even engaging , in the meantime ? 
not really . 
the sad part is that the arrow and i both dig on flicks like this , so we actually figured most of it out by the half-way point , 
so all of the strangeness after that did start to make a little bit of sense , but it still didn't the make the film all that more entertaining . 
i guess the bottom line with movies like this is that you should always make sure that the audience is " into it " even before 
they are given the secret password to enter your world of understanding . 
i mean , showing melissa sagemiller running away from visions for about 20 minutes throughout the movie is just plain lazy ! ! 
okay , we get it . . . there 
are people chasing her and we don't know who they are . 

do we really need to see it over and over again ? 
how about giving us different scenes offering further insight into all of the strangeness going down in the movie ? 
apparently , the studio took this film away from its director and chopped it up themselves , and it shows . 
there might've been a pretty decent teen mind-fuck movie in here somewhere , but i guess " the suits " decided that turning it into a music video with little edge ,
would make more sense . 
the actors are pretty good for the most part , although wes bentley just seemed to be playing the exact same character that he did in american beauty , only in a new neighborhood . 
but my biggest kudos go out to sagemiller , who holds her own throughout the entire film , and actually has you feeling her character's unraveling . 
overall , the film doesn't stick because it doesn't entertain , it's confusing , it rarely excites and it feels pretty redundant for most of its runtime , 
despite a pretty cool ending and explanation to all of the craziness that came before it . 
oh , and by the way , this is not a horror or teen slasher flick . . . it's 
just packaged to look that way because someone is apparently assuming that the genre is still hot with the kids . 
it also wrapped production two years ago and has been sitting on the shelves ever since . 
whatever . . . skip 
it ! 

where's joblo coming from ? 
a nightmare of elm street 3 ( 7/10 ) - blair witch 2 ( 7/10 ) - the crow ( 9/10 ) - the crow : salvation ( 4/10 ) - lost highway ( 10/10 ) - memento ( 10/10 ) - the others ( 9/10 ) - stir of echoes ( 8/10 ) 

第2个negative review (neg/cv001_19502):

damn that y2k bug . 
it's got a head start in this movie starring jamie lee curtis and another baldwin brother ( william this time ) 
in a story regarding a crew of a tugboat that comes across a deserted russian tech ship that has a strangeness to it when they kick the power back on . 
little do they know the power within . . . 
going for the gore and bringing on a few action sequences here and there , virus still feels very empty , like a movie going for all flash and no substance . 
we don't know why the crew was really out in the middle of nowhere , we don't know the origin of what took over the ship
( just that a big pink flashy thing hit the mir ) , and , of course , we don't know why donald sutherland is stumbling around drunkenly throughout . 
here , it's just " hey , let's chase these people around with some robots " . 
the acting is below average , even from the likes of curtis . 
you're more likely to get a kick out of her work in halloween h20 . 
sutherland is wasted and baldwin , well , he's acting like a baldwin , of course . 
the real star here are stan winston's robot design , some schnazzy cgi , and the occasional good gore shot , like picking into someone's brain . 
so , if robots and body parts really turn you on , here's your movie . 
otherwise , it's pretty much a sunken ship of a movie . 

影评内容风格各异,有长有短

1.2 subjectivity:电影摘要与评论

用于主观性分析的数据集,这个语料库由 5000 条主观句子(subjective)和 5000 条客观句子(objective)组成,专门用于情感分析和主观性分类任务。

来源于 Bo Pang 和 Lillian Lee 的研论文《A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts》(ACL 2004)。

  • 5000 条主观句子(subjective sentences)来自评论,因为它们表达了作者的观点、情感或评价。

  • 5000 条客观句子(objective sentences),来自 《电影剧情摘要》,这些摘要通常是事实性的描述,不带有明显的个人情感或评价。

该数据集样本不用文件夹分割,所有影评放在一个文本

每个句子都经过预处理,单词和标点符号以空格分隔,使用 WhitespaceTokenizer 解析。

  • obj的第一个样本

[‘the’, ‘movie’, ‘begins’, ‘in’, ‘the’, ‘past’, ‘where’, ‘a’, ‘young’, ‘boy’, ‘named’, ‘sam’, ‘attempts’, ‘to’, ‘save’, ‘celebi’, ‘from’, ‘a’, ‘hunter’, ‘.’]
[‘obj’, ‘subj’]

这里用的是sents方法,如果用raw()方法会返回全部样本字符

  • subj的第一个样本

smart and alert , thirteen conversations about one thing is a small gem .

这里用的是字符串自带方法join(), 即用空格分割列表元素并转为字符串:" ".join()

  • 完整代码
from nltk.corpus import subjectivity

subj = subjectivity.sents(categories='subj')  # fileids, raw 获取主观句子
obj = subjectivity.sents(categories='obj')   # 获取客观句子
categories = subjectivity.categories()              # 返回 ['obj', 'subj']

print(len(subj)," ".join(subj[0]))
print(len(obj),obj[0])
print(categories)

2. 多分类语料

第一节的路透社(reuters)带有《新闻主题》多标签:

[‘acq’, ‘alum’, ‘barley’, ‘bop’, ‘carcass’, ‘castor-oil’, ‘cocoa’, ‘coconut’, ‘coconut-oil’, ‘coffee’, ‘copper’, ‘copra-cake’, ‘corn’, ‘cotton’, ‘cotton-oil’, ‘cpi’, ‘cpu’, ‘crude’, ‘dfl’, ‘dlr’, ‘dmk’, ‘earn’, ‘fuel’, ‘gas’, ‘gnp’, ‘gold’, ‘grain’, ‘groundnut’, ‘groundnut-oil’, ‘heat’, ‘hog’, ‘housing’, ‘income’, ‘instal-debt’, ‘interest’, ‘ipi’, ‘iron-steel’, ‘jet’, ‘jobs’, ‘l-cattle’, ‘lead’, ‘lei’, ‘lin-oil’, ‘livestock’, ‘lumber’, ‘meal-feed’, ‘money-fx’, ‘money-supply’, ‘naphtha’, ‘nat-gas’, ‘nickel’, ‘nkr’, ‘nzdlr’, ‘oat’, ‘oilseed’, ‘orange’, ‘palladium’, ‘palm-oil’, ‘palmkernel’, ‘pet-chem’, ‘platinum’, ‘potato’, ‘propane’, ‘rand’, ‘rape-oil’, ‘rapeseed’, ‘reserves’, ‘retail’, ‘rice’, ‘rubber’, ‘rye’, ‘ship’, ‘silver’, ‘sorghum’, ‘soy-meal’, ‘soy-oil’, ‘soybean’, ‘strategic-metal’, ‘sugar’, ‘sun-meal’, ‘sun-oil’, ‘sunseed’, ‘tea’, ‘tin’, ‘trade’, ‘veg-oil’, ‘wheat’, ‘wpi’, ‘yen’, ‘zinc’]

共有大约 90 个左右的主题标签,覆盖财经、商品、贸易、市场、货币等领域。

每个新闻稿可以属于多个标签。

其输出代码如下:

from nltk.corpus import reuters
categories = reuters.categories() # # 输出所有的分类标签
print(categories)

另一类多分类的是product_reviews_1商品评论:

2.1 product_reviews_1商品概述

  • Apex_AD2600_Progressive_scan_DVD player.txt

    • 逐行扫描 DVD 播放器:支持 480p 高分辨率输出,适合 HDTV 或 HD-ready 电视。
    • 多格式兼容:可播放 DVD、MP3 CD、WMA CD、JPEG/Kodak 图片 CD(用于幻灯片播放),部分支持 DVD-R。
    • 全屏适配功能(AFF):将 16:9 宽屏视频调整为 4:3 电视屏幕。
    • 价格:2003-2004 年价格约 39.99-69.99 美元(折扣后),定位经济型。
    • 评论概况:
      • 正面:用户称赞其多格式播放能力和性价比,例如“几乎可以播放任何放入的碟片” 。
      • 负面:遥控器功能不佳(反应迟钝,非通用型)、部分 DVD(如迪士尼电影)无法播放、耐用性差(部分设备数月内故障)。
      • 情感关键词:“picture quality”, “cheap”, “remote doesn’t work”
      • 总结:因价格低廉和格式支持广受好评,但可靠性和遥控器问题受批评。
  • Canon_G3.txt

    • 数码相机:400 万像素传感器,适合 2000 年代初摄影需求。
    • 镜头:佳能高品质镜头,配备光学变焦(约 4 倍)。
    • 功能:支持手动控制、RAW 格式拍摄,紧凑设计适合进阶用户、摄影爱好者和半专业人士。
    • 评论概况:
      • 正面:图像质量高、手动控制灵活、多功能,例如“照片清晰且色彩鲜艳”
      • 负面:手动设置学习曲线陡峭,机身较点拍相机稍显笨重。
      • 情感关键词:“great pictures”, “slow focus”, “battery life is bad”
      • 总结:因专业功能和紧凑设计广受好评,适合追求高质量摄影的用户。
  • Creative_Labs_Nomad_Jukebox_Zen_Xtra_40GB.txt

    • 便携音频设备(MP3 播放器), 2003 年发布,定位高端音乐爱好者。
    • 存储容量:40GB 硬盘,可存储约 10,000 首歌曲。
    • 音频支持:支持 MP3、WMA 等格式,音质优异。
    • 功能:可更换电池、大屏幕显示、USB 2.0 快速传输。
    • 评论概况:
      • 正面:大容量和音质受好评,例如“存储大量歌曲,音质清晰” 。
      • 负面:硬盘启动慢、界面复杂、偶尔死机或电池寿命短。
      • 情感关键词:“sound is awesome”, “software sucks”, “large capacity”
      • 总结:因大容量和高音质受青睐,但操作复杂性和可靠性问题被批评。
  • Nikon_coolpix_4300.txt

    • 数码相机: 分辨率400 万像素,适合日常摄影。
    • 镜头:尼康光学变焦镜头(约 3 倍光学变焦)。
    • 功能:自动和手动模式、紧凑设计、易于携带。
    • 目标用户:家庭用户和摄影初学者。
    • 评论概况:
      • 正面:易用、图像质量好、便携,例如“相机小巧,照片效果好” 。
      • 负面:低光拍摄效果差、电池耗电快、缺少高级手动功能。
      • 情感关键词:“excellent pictures”, “a bit heavy”, “menus are confusing”
      • 总结:因便携性和易用性受家庭用户欢迎,但在低光环境下表现一般。
  • Nokia_6610.txt

    • 上一代手机(非智能机), 彩色屏幕(128x128 像素) 2003 年左右的畅销机型及主流设计。
    • 功能:支持 SMS、MMS、FM 收音机、GPRS 网络、Java 游戏。
    • 设计:经典直板设计,简洁外观,耐用,内置天线。
    • 评论概况:
      • 正面:信号稳定、电池续航长、设计耐用,例如“电池可以用好几天” 。
      • 负面:屏幕小、功能较基础(如无蓝牙)、按键手感一般。
      • “great signal”, “buttons are too small”, “classic Nokia build”
      • 总结:因耐用性和电池续航受好评,但功能相对简单,适合基本通信需求。

2.2 product_reviews_1:商品标签及评论

5类商品评价

2.1.1 Apex_AD2600_Progressive_scan_DVD player.txt

  • 评价数量:740

  • 标签

[‘1220’, ‘1600’, ‘aff’, ‘amazon’, ‘apex’, ‘audio’, ‘audio output’, ‘auto fit’, ‘build quality’, ‘button’, ‘case’, ‘cd’, ‘cd audio disc’, ‘code’, ‘color’, ‘color signal’, ‘customer service’, ‘customer support’, ‘design’, ‘different file’, ‘direction’, ‘disc’, ‘disk’, ‘disney movie’, ‘display’, ‘divx rip’, ‘door’, ‘dvd’, ‘dvd disc’, ‘dvd media’, ‘dvd player’, ‘external display’, ‘feature’, ‘finish’, ‘format’, ‘forward’, ‘freeze’, ‘freezing’, ‘heat’, ‘jpeg’, ‘jpeg picture’, ‘jpeg slideshow’, ‘layer dvd’, ‘line support’, ‘loading’, ‘look’, ‘machine’, ‘manual’, ‘media’, ‘menu’, ‘motor’, ‘mp3’, ‘mp3 filename’, ‘mpeg’, ‘mpeg1’, ‘no disc’, ‘noise’, ‘off button’, ‘onscreen display’, ‘output’, ‘p button’, ‘panel’, ‘panel button layout’, ‘picture’, ‘picture clarity’, ‘picture quality’, ‘play’, ‘player’, ‘power supply’, ‘price’, ‘product’, ‘progressive scan’, ‘progressive scan player’, ‘quality’, ‘r’, ‘read’, ‘recognize’, ‘reliability’, ‘remote’, ‘remote button’, ‘remote control’, ‘remote layout’, ‘rewind’, ‘run’, ‘screen’, ‘screw tip’, ‘service’, ‘set up’, ‘shipping’, ‘silver plate’, ‘size’, ‘smell’, ‘sound’, ‘speed’, ‘support’, ‘svcd’, ‘sync’, ‘tech support’, ‘technical support’, ‘unit’, ‘universal remote control’, ‘usage’, ‘use’, ‘user interface’, ‘vbr mp3 cd’, ‘vcd’, ‘video’, ‘video format’, ‘video output’, ‘video quality’, ‘weight’, ‘windows media’, ‘work’, ‘zoom’, ‘zoom mode’]

  • 样本
1 repost from january 13 , 2004 with a better fit title .
2 does your apex dvd player only play dvd audio without video ?
3 or does it play audio and video but scrolling in black and white ?
4 before you try to return the player or waste hours calling apex tech support , or run the player over with your car , 
try these simple troubleshooting ideas first .
5 no picture :
...
734 however , i do n ' t know the dvd ' s performance on a heavy load of every - day viewing .
735 either way , can ' t go wrong with this price .
736 i am really impressed by this dvd player .
737 if it can fit in the drive bay , this dvd player will play it .
738 for instance , i made several back - ups of my dvd movies using dvd - r ( w ) and + r ( w ) and it plays the dvds .
739 no matter the format .
740 awesome !

2.1.2 Canon_G3.txt

  • 评价数量:597

  • 标签

[‘4mp’, ‘4mp camera’, ‘4mp resolution’, ‘auto mode’, ‘auto setting’, ‘automode’, ‘battery’, ‘battery charging system’, ‘battery life’, ‘body’, ‘button’, ‘camera’, ‘canera’, ‘canon’, ‘canon g3’, ‘canon powershot g3’, ‘casing’, ‘color’, ‘compactflash’, ‘control’, ‘darn diopter adjustment dial’, ‘delay’, ‘depth’, ‘design’, ‘dial’, ‘digital camera’, ‘digital zoom’, ‘display’, ‘distortion’, ‘download’, ‘exposure control’, ‘external flash hot shoe’, ‘feature’, ‘feel’, ‘finish’, ‘flash’, ‘flash photo’, ‘focus’, ‘four megapixel’, ‘function’, ‘g3’, ‘grain’, ‘highlight’, ‘hot shoe flash’, ‘image’, ‘image quality’, ‘import’, ‘lag’, ‘lag time’, ‘lcd’, ‘learning’, ‘learning curve’, ‘lens’, ‘lens cap’, ‘lens cover’, ‘lense’, ‘lever’, ‘light auto correction’, ‘look’, ‘low light focus’, ‘macro’, ‘made’, ‘manual’, ‘manual function’, ‘manual mode’, ‘memory card’, ‘menu’, ‘metering option’, ‘night mode’, ‘noise’, ‘off button’, ‘optic’, ‘optical zoom’, ‘option’, ‘performance’, ‘photo’, ‘photo quality’, ‘picture’, ‘picture quality’, ‘price’, ‘print’, ‘product’, ‘quality’, ‘raw format’, ‘raw image’, ‘remote’, ‘service’, ‘shape’, ‘shoot’, ‘shot’, ‘size’, ‘software’, ‘speed’, ‘spot metering’, ‘stitch picture’, ‘strap’, ‘tiff format’, ‘unresponsiveness’, ‘use’, ‘viewfinder’, ‘weight’, ‘white balance’, ‘white offset’, ‘zoom’, ‘zooming lever’]

1 i recently purchased the canon powershot g3 and am extremely satisfied with the purchase .
2 the camera is very easy to use , in fact on a recent trip this past week i was asked to take a picture of a vacationing elderly group .
3 after i took their picture with their camera , they offered to take a picture of us .
4 i just told them , press halfway , wait for the box to turn green and press the rest of the way .
...
593 even with these shortcomings , i still think it is the best digital camera available under $ 1200 .
594 definetely a great camera .
595 proven canon built quality and lens .
596 feels solid in hand .
597 rather heavy for point and shoot but a great camera for semi pros .

2.1.3 Creative_Labs_Nomad_Jukebox_Zen_Xtra_40GB.txt

  • 评价数量:1716

  • 标签

[‘0’, ‘accessing file’, ‘accessory’, ‘affordability’, ‘alarm’, ‘appearance’, ‘audio’, ‘backlight’, ‘balance’, ‘battery’, ‘battery life’, ‘battery top’, ‘bookmakr’, ‘bookmark’, ‘break’, ‘buck’, ‘build’, ‘button’, ‘capacity’, ‘case’, ‘cd burner’, ‘cd rip’, ‘change’, ‘chinese name’, ‘click buttons’, ‘clip’, ‘clock’, ‘color’, ‘construction’, ‘control’, ‘cover’, ‘creative’, ‘creative product’, ‘customer support’, ‘customer support website’, ‘deal’, ‘delete’, ‘design’, ‘display’, ‘durability’, ‘earbud’, ‘earphone’, ‘eax’, ‘eax mode’, ‘enviromental audio’, ‘equalizer’, ‘equilizer’, ‘equipment’, ‘explorer’, ‘face plate’, ‘feature’, ‘feel’, ‘file limit’, ‘file transfer’, ‘finding’, ‘firewire’, ‘firmware’, ‘flip switch’, ‘fly wheel’, ‘flywheel’, ‘fm’, ‘fm receiver’, ‘folder’, ‘folder structure’, ‘freeze’, ‘freeze up’, ‘front cover’, ‘game’, ‘hard drive’, ‘headphone’, ‘headphone jack’, ‘id3’, ‘id3 tag’, ‘installation’, ‘instruction’, ‘interface’, ‘itunes’, ‘jog dial’, ‘lcd’, ‘leather case’, ‘leather pouch’, ‘line out jack’, ‘load’, ‘lock up’, ‘look’, ‘looking’, ‘manage’, ‘manual’, ‘mediasource’, ‘memory’, ‘menu’, ‘menue’, ‘mp3 player’, ‘music’, ‘musicmatch software’, ‘name’, ‘napster’, ‘navigation’, ‘navigation wheel’, ‘navigational system’, ‘nomad’, ‘nomad explorer’, ‘notmad’, ‘notmad software’, ‘online help’, ‘online music service’, ‘operate’, ‘option’, ‘panel’, ‘pause’, ‘pc compatibility’, ‘play’, ‘play mode’, ‘play option’, ‘playback quality’, ‘player’, ‘player hardware’, ‘playlist’, ‘plug and play’, ‘power output’, ‘price’, ‘product’, ‘program’, ‘quality’, ‘recharger’, ‘recognition’, ‘recording’, ‘remote’, ‘remove’, ‘rename’, ‘replacement battery’, ‘rip’, ‘rip cd’, ‘screen’, ‘screen saver’, ‘scroll’, ‘scroll wheel’, ‘set up’, ‘setup’, ‘shuffle’, ‘shuttle’, ‘signal to noise ratio’, ‘size’, ‘software’, ‘song speed’, ‘sorting’, ‘sound’, ‘sound option’, ‘sound quality’, ‘sound setting’, ‘stop button’, ‘stoppage’, ‘storage’, ‘storage capacity’, ‘style’, ‘support’, ‘switch’, ‘sync’, ‘tag’, ‘the unit’, ‘thing’, ‘things’, ‘this’, ‘this item’, ‘this thing’, ‘top’, ‘transfer’, ‘transfter’, ‘unit’, ‘up face’, ‘uploading’, ‘usb recharge’, ‘use’, ‘user interface’, ‘value’, ‘voice recording’, ‘volume’, ‘volume range’, ‘wake up’, ‘warranty’, ‘weight’, ‘wheel’, ‘wma file’, ‘work’, ‘xtra’, ‘zen’, ‘zen xtra’, ‘zx’]

  • 样本
1 this is an edited review , now that i have had time to use the device .
2 while , there are flaws with the machine , the xtra gets five stars because of its affordability .
3 it is the most bang - for - the - buck out there .
4 like it ' s predecessor , the quickly revised nx , this player boasts a decent size and weight , 
a relatively - intuitive navigational system that categorizes based on id3 tags ,  and excellent sound 
( widely known to be better than ipod - not surprising considering the number of years creative has been in the audio peripheral business ) .
5 the xtra improves upon the zen nx with a larger , now - blue backlit screen , which is infinitely better .
6 further , the xtra doubles the maximum filecount capacity to 16000 mp3 .
...
1712 in that model the hard drive just died one morning before my class .
1713 it ' s nothing major , just a bad hard drive , any hard drive mp3 player can have that problem .
1714 so rule of thumb , no matter what you end up buying , get the extended warranty !
1715 it always pays off .
1716 hope i ' ve been of some help .

2.1.4 Nikon_coolpix_4300.txt

  • 评价数量:346

  • 标签

[‘4mp’, ‘8mb’, ‘8mb card’, ‘accessory’, ‘audio’, ‘auto focus’, ‘auto mode’, ‘auto setting’, ‘autofocus’, ‘battery’, ‘battery life’, ‘camera’, ‘closeup mode’, ‘construction’, ‘continuous shot mode’, ‘control’, ‘customer service’, ‘delay’, ‘design’, ‘digital zoom’, ‘download’, ‘ease of use’, ‘feature’, ‘firewire’, ‘focus assist light’, ‘function’, ‘image’, ‘image download’, ‘indoor image’, ‘indoor picture’, ‘indoor shot’, ‘lcd’, ‘learn’, ‘lens cap’, ‘lense cap’, ‘macro’, ‘macro mode’, ‘manual’, ‘manual mode’, ‘memory card’, ‘menu’, ‘menu dial knob’, ‘movie’, ‘movie mode’, ‘nikon’, ‘nikon 4300’, ‘nikon support’, ‘online service’, ‘optic’, ‘optical setting’, ‘optical zoom’, ‘photo’, ‘photo quality’, ‘picture’, ‘picture quality’, ‘price’, ‘print’, ‘print quality’, ‘quality’, ‘rechargable battery’, ‘redeye’, ‘scene mode’, ‘servicing’, ‘size’, ‘software’, ‘sunset feature’, ‘system error’, ‘touchup’, ‘transfer’, ‘txt file’, ‘up shooting’, ‘use’, ‘viewfinder’, ‘weight’, ‘zoomed image’]

  • 样本
1 this camera is perfect for an enthusiastic amateur photographer .
2 the pictures are razor - sharp , even in macro .
3 it is small enough to fit easily in a coat pocket or purse .
4 it is light enough to carry around all day without bother .
...
345 the same 4mp chip from the 4500 camera , plus a 3x zoom with the ability to expand upon that with extenders , 
great closeup mode , long lasting rechargable battery , etc etc .
346 in my opinion it ' s the best camera for the money if you ' re looking for something that ' s easy to use , 
small good for travel , and provides excellent , sharp images .

2.1.5 Nokia_6610.txt

  • 评价数量:547

  • 标签

[‘application’, ‘background’, ‘backlight’, ‘battery’, ‘battery life’, ‘bluetooth’, ‘browsing’, ‘button’, ‘calendar’, ‘call’, ‘camera’, ‘color’, ‘color screen’, ‘command’, ‘construction’, ‘csr’, ‘customer rep’, ‘customer service’, ‘default ringtone’, ‘design’, ‘durability’, ‘ear’, ‘earpiece’, ‘ergonomics’, ‘feature’, ‘fm’, ‘fm radio’, ‘game’, ‘gprs’, ‘gsm’, ‘headphone jack’, ‘headset’, ‘headset jack’, ‘high speed internet’, ‘infrared’, ‘internet’, ‘key’, ‘key lock’, ‘keypad’, ‘layout’, ‘look’, ‘loud phone’, ‘memory’, ‘menu’, ‘menu option’, ‘menu options’, ‘message’, ‘mms’, ‘mobile’, ‘mobile reception’, ‘mobile service’, ‘network’, ‘nokia’, ‘operate’, ‘pc cable’, ‘pc suite’, ‘pc sync’, ‘phone’, ‘phone book’, ‘phone performance’, ‘picture’, ‘picture sharing’, ‘pim’, ‘plan’, ‘quality’, ‘radio’, ‘rate plan’, ‘reception’, ‘resolution’, ‘ring’, ‘ring tone’, ‘ringer’, ‘ringing tone’, ‘ringtone’, ‘screen’, ‘screensaver’, ‘service’, ‘signal’, ‘signal quality’, ‘size’, ‘software’, ‘sound’, ‘sound quality’, ‘sound volume’, ‘speaker’, ‘speaker phone’, ‘speakerphone’, ‘sprint’, ‘sprint customer service’, ‘sprint plan’, ‘sturdy’, ‘t customer service’, ‘tone’, ‘tune’, ‘use’, ‘user interface’, ‘vibrate setting’, ‘vibration’, ‘voice’, ‘voice dialing’, ‘voice quality’, ‘volume’, ‘volume control’, ‘volume key’, ‘wallpaper’, ‘warranty’, ‘web’, ‘weight’, ‘wireless telephone’, ‘work’, ‘zone’]

  • 样本
1 i am a business user who heavily depend on mobile service .
2 there is much which has been said in other reviews about the features of this phone , 
it is a great phone , mine worked without any problems right out of the box .
3 just double check with customer service to ensure the number provided by amazon is for the city / exchange you wanted .
4 after several years of torture in the hands of at & t customer service i am delighted to drop them , 
and look forward to august 2004 when i will convert our other 3 family - phones from at & t to t - mobile !
...
544 it is crystal clear .
545 this is one of the nicest phones nokia has made .
546 i do recommend getting the data kit for those geeks .
547 there are a lot of cool websites with games and midi ringtones to download for free .

3.句法分析语料

带词性标注的语料,适用于词性标注训练/测试。

3.1 Brown

新闻Brown Corpus有该类型标注,是一种较为简化的版本,具体Tagset如下:

标签含义
ATArticle(冠词)
NNNoun(名词)
JJAdjective(形容词)
VBDVerb, past tense
NP-TLProper noun in title (专有名词)
NN-TLNoun in Title (标题名词)
  • 测试代码
from nltk.corpus import brown
tagged_sent = brown.tagged_sents()[0] # 获取标注好的句子(Brown tagset)
print(tagged_sent[:10])
  • 输出标注内容

[(‘The’, ‘AT’), (‘Fulton’, ‘NP-TL’), (‘County’, ‘NN-TL’), (‘Grand’, ‘JJ-TL’), (‘Jury’, ‘NN-TL’), (‘said’, ‘VBD’), (‘Friday’, ‘NR’), (‘an’, ‘AT’), (‘investigation’, ‘NN’), (‘of’, ‘IN’)]

3.2 treebank:宾州树库(句法分析)

Treebank 语料库主要基于《华尔街日报》(Wall Street Journal, WSJ)的文章,文件以 wsj_XXXX.mrg 命名, 共200个文件。

  • 代码测试文件名
from nltk.corpus import treebank
file_ids = treebank.fileids()

输出文件名:

[‘wsj_0001.mrg’, ‘wsj_0002.mrg’, ‘wsj_0003.mrg’, ‘wsj_0004.mrg’, ‘wsj_0005.mrg’, …, ‘wsj_0199.mrg’]

  • Treebank 语料库包含:

  • 词性标注(POS tagged):词和对应的 POS 标签。

  • 句法树(Parsed sentences):句子的句法结构,表示为树形结构。

  • 原始文本:未标注的句子。

每个文件包含多个句子的标注。Treebank 语料库约 100万词,句法树可能包含嵌套结构,处理时需要熟悉递归或树遍历方法。

3.2.1 词性标注

nltk 默认采用 Penn Treebank 词性标注(POS Tags):

在 Brown 中,冠词标记是 AT;而在 Penn Treebank 中,它们被统一划为 DT(Determiner)。

标记(Tag)含义 英文(中文)示例
CCCoordinating conjunction (并列连词)and, but, or, yet
CDCardinal number (基数) 与 Cardinal Number(序数)one, two, 1999
DTDeterminer (限定词)the, a, an
EXExistential there (存在词)there (There is …)
FWForeign word (外来词)c’est, etc., esprit
INPreposition or subordinating conj. (介词/从属连词)in, of, like, although
JJAdjective (形容词)big, beautiful, green
JJRAdjective, comparative (比较级形容词)bigger, smaller
JJSAdjective, superlative (最高级形容词)biggest, smallest
LSList item marker (列表项标记)1), a), B.
MDModal (情态动词)can, will, should
NNNoun, singular (单数名词)dog, year
NNSNoun, plural (复数名词)dogs, years
NNPProper noun, singular (专有名词,单数)John, London
NNPSProper noun, plural (专有名词,复数)Smiths, Americans
PDTPredeterminer (前限定词)all, both
POSPossessive ending (所有格结尾)’s, ’
PRPPersonal pronoun (人称代词)I, you, he, she, it
PRP$Possessive pronoun (物主代词)my, your, his, her
RBAdverb (副词)quickly, very, well
RBRAdverb, comparative (比较级副词)better, faster
RBSAdverb, superlative (最高级副词)best, fastest
RPParticle (小品词)up, off, out
SYMSymbol (符号)$, %, +
TOto (to词)to
UHInterjection (感叹词)oh, wow, oops
VBVerb, base form (动词原形)run, eat, be
VBDVerb, past tense (过去式)ran, ate, was
VBGVerb, gerund/present participle (现在分词)running, being
VBNVerb, past participle (过去分词)eaten, been
VBPVerb, non-3rd person present (现在式)run, eat (除 he/she/it 外)
VBZVerb, 3rd person present (现在三单)runs, eats, is
WDTWh-determiner (限定名词,出现在名词前,相当于形容词性用法)which, that
WPWh-pronoun (代替人/事物,直接作为主语或宾语使用)who, what
WP$Possessive wh-pronoun (物主代词 表示所属关系,限定名词)whose
WRBWh-adverb (副词,修饰整个句子,询问地点、时间、原因、方式。)where, when, why

其中符号的标注多为符号本身:

符号词性标记说明
..句号
,,逗号
::冒号、分号、破折号
''''右引号
( )-LRB--RRB-左右括号
...:(罕见)若是省略号,会显示不同的词素
  • 代码测试
tagged_words = treebank.tagged_words() #-------------------- 获取所有带 POS 标签的词 -----------------------

print(tagged_words[:10]) # 示例:打印前 10 个带 POS 标签的词

tagged_words_file = treebank.tagged_words(fileids='wsj_0002.mrg')
print(tagged_words_file[:10]) # 获取特定文件的带 POS 标签的词(例如 wsj_0001.mrg)

[(‘Pierre’, ‘NNP’), (‘Vinken’, ‘NNP’), (‘,’, ‘,’), (‘61’, ‘CD’), (‘years’, ‘NNS’), (‘old’, ‘JJ’), (‘,’, ‘,’), (‘will’, ‘MD’), (‘join’, ‘VB’), (‘the’, ‘DT’)]

[(‘Rudolph’, ‘NNP’), (‘Agnew’, ‘NNP’), (‘,’, ‘,’), (‘55’, ‘CD’), (‘years’, ‘NNS’), (‘old’, ‘JJ’), (‘and’, ‘CC’), (‘former’, ‘JJ’), (‘chairman’, ‘NN’), (‘of’, ‘IN’)]

3.2.2 句法树

Treebank 的句法树是其核心内容,存储为 nltk.tree.Tree 对象。

# 获取所有句法树
parsed_sents = treebank.parsed_sents()

# 示例:打印第一个句法树
print(parsed_sents[0])

# 或者以树形结构可视化(需要安装 graphviz 和 python-graphviz)
parsed_sents[0].draw()  # 弹出图形界面显示树

# 获取特定文件的句法树
parsed_sents_file = treebank.parsed_sents(fileids='wsj_0001.mrg')
print(parsed_sents_file[0])

# 提取纯文本句子(词序列)
tree = parsed_sents[0]
words = tree.leaves()
sentence = ' '.join(words)
print(sentence)
  • 不带标注的句子如下:

‘Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29 .’

  • 输出句法树的图像界面如下:

在这里插入图片描述

  • 输出句法树的命令行如下:
(S
  (NP-SBJ
    (NP (NNP Pierre) (NNP Vinken))
    (, ,)
    (ADJP (NP (CD 61) (NNS years)) (JJ old))
    (, ,))
  (VP
    (MD will)
    (VP
      (VB join)
      (NP (DT the) (NN board))
      (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director)))
      (NP-TMP (NNP Nov.) (CD 29))))
  (. .))
  • 结构解析
 	1. (S …) — 整个句子(Sentence)
 	2. (NP-SBJ …) — 主语短语(Subject Noun Phrase)
 		2.1. (NNP Pierre) (NNP Vinken):人名,专有名词
 		2.2 这是一个带插入语的主语结构,逗号表示插入语边界
		2.3.ADJP:形容词短语 “61 years old”
		2.4 (CD 61) (NNS years):数词 + 名词,形成名词短语
		2.5 (JJ old):形容词 “old”
	3. (VP …) — 谓语短语(Verb Phrase)
		3.3 (MD will):情态动词 will
		3.4 (VB join):动词原形 join
		3.5 (NP the board):直接宾语
		3.6 (PP-CLR as a nonexecutive director):
		清除性介词短语(表示角色),“作为一名非执行董事”
		3.7 (NP-TMP Nov. 29):临时性时间状语,“11月29日”
	4. . (. .) — 句号

3.2.3 原始文本

  • 代码测试句子

sents = treebank.sents() #所有句子

sents_file = treebank.sents(fileids=‘wsj_0001.mrg’) # 获取特定文件的句子

输出

[‘Pierre’, ‘Vinken’, ‘,’, ‘61’, ‘years’, ‘old’, ‘,’, ‘will’, ‘join’, ‘the’, ‘board’, ‘as’, ‘a’, ‘nonexecutive’, ‘director’, ‘Nov.’, ‘29’, ‘.’]

  • 统计所有句子pos频率代码
from collections import Counter # 统计所有 POS 标签的频率

tags = [tag for word, tag in treebank.tagged_words()]
tag_freq = Counter(tags)
print(tag_freq.most_common(10))  # 打印最常见的 10 个标签

输出

[(‘NN’, 13166), (‘IN’, 9857), (‘NNP’, 9410), (‘DT’, 8165), (‘-NONE-’, 6592),
(‘NNS’, 6047), (‘JJ’, 5834), (‘,’, 4886), (‘.’, 3874), (‘CD’, 3546)]

即单数名词第一多,其次是介词。

3.3 CoNLL2000

这个数据集来自 2000 年的 CoNLL(Computational Natural Language Learning),用于浅层句法分析(shallow chunkers),包括基于规则的分析器,基于机器学习的分类器(如 CRF、MaxEnt、BiLSTM-CRF)或词组结构识别(chunking)任务。

数据来自Penn Treebank 的 Wall Street Journal (WSJ) 语料部分转换而来

  • 标注 Chunk 标签(句法结构块)

  • 每一个句子是一个 nltk.Tree 对象

分块标签标识句子的短语结构。其中每个子节点是一个表示词块(chunk,如 NP, VP)的子树, 相对于treebank,CoNLL-2000 并不标注完整从句结构, 它只标注扁平的短语块。

因此,IN 类型的从属连词(如 if, when, that),CC 并列连词(如 and, but)副词RB等并不构成单独的 PP(介词短语),也不属于 NP、VP,会被单独列出来(chunk 外部)

3.3.1 三元组标签

三元组包含三个部分:词(word)、词性标签(POS tag)和分块标签(chunk tag)。

  • POS 标签(Part-of-Speech Tags)
    • 定义:POS 标签表示单词的语法类别,例如名词、动词、形容词等。
    • 标签集:在 CoNLL-2000 语料库(你的示例数据来自此语料库)中,POS 标签基于 Penn Treebank 标签集,例如:
      • NN:名词(单数)
      • IN:介词
      • DT:限定词
      • VBZ:动词(第三人称单数)
      • RB:副词
      • VBN:动词(过去分词)
      • TO:to(作为介词或不定式标记)
      • VB:动词(原形)

作用:POS 标签描述单词的句法角色,独立于句子结构。

  • 分块标签(Chunk Tags)

    • 定义:分块标签表示单词所属的短语块(chunk),例如名词短语(NP)、动词短语(VP)、介词短语(PP)等。它们使用 IOB 格式 来标记短语的边界。
    • IOB 格式:
      • B-XXX:表示短语块的开始(Beginning),XXX 是短语类型(如 NP、 VP、 PP)。
      • I-XXX:表示短语块的内部(Inside),即短语的后续词。
      • O:表示单词不属于任何短语块 。
  • 常见分块类型

    • NP:名词短语(Noun Phrase),如“the pound”。
    • VP:动词短语(Verb Phrase),如“is widely expected”.
    • PP:介词短语(Prepositional Phrase),如“in the pound”.

示例:

[('Confidence', 'NN', 'B-NP'), ('in', 'IN', 'B-PP'), ('the', 'DT', 'B-NP'),
('pound', 'NN', 'I-NP'), ('is', 'VBZ', 'B-VP'), ('widely', 'RB', 'I-VP'), 
('expected', 'VBN', 'I-VP'), ('to', 'TO', 'I-VP'), ('take', 'VB', 'I-VP'), 
('another', 'DT', 'B-NP')]

示例分析:

  • (‘Confidence’, ‘NN’, ‘B-NP’):B-NP 表示“Confidence”是名词短语的开始。
  • (‘in’, ‘IN’, ‘B-PP’):B-PP 表示“in”是介词短语的开始。
  • (‘pound’, ‘NN’, ‘I-NP’):I-NP 表示“pound”是名词短语的内部词(属于前面的“the”开始的 NP)。
  • (‘is’, ‘VBZ’, ‘B-VP’):B-VP 表示“is”是动词短语的开始。

3.3.2 浅层结构划分

这里没有嵌套更深的从句结构等, 结构类似于句法树, 测试句子为:

Sentence 1: 
['Confidence', 'in', 'the', 'pound', 'is', 'widely', 'expected', 
'to', 'take', 'another', 'sharp', 'dive', 'if', 'trade', 'figures', 
'for', 'September', ',', 'due', 'for', 'release', 'tomorrow', ',', 
'fail', 'to', 'show', 'a', 'substantial', 'improvement', 'from', 'July', 'and', 'August', "'s", 'near-record', 'deficits', '.']

浅层结构树为:

(S
  (NP Confidence/NN)
  (PP in/IN)
  (NP the/DT pound/NN)
  (VP is/VBZ widely/RB expected/VBN to/TO take/VB)
  (NP another/DT sharp/JJ dive/NN)
  if/IN
  (NP trade/NN figures/NNS)
  (PP for/IN)
  (NP September/NNP)
  ,/,
  due/JJ
  (PP for/IN)
  (NP release/NN)
  (NP tomorrow/NN)
  ,/,
  (VP fail/VB to/TO show/VB)
  (NP a/DT substantial/JJ improvement/NN)
  (PP from/IN)
  (NP July/NNP and/CC August/NNP)
  (NP 's/POS near-record/JJ deficits/NNS)
  ./.)

或:

 (S
  (NP Rockwell/NNP International/NNP Corp./NNP)
  (NP 's/POS Tulsa/NNP unit/NN)
  (VP said/VBD)
  (NP it/PRP)
  (VP signed/VBD)
  (NP a/DT tentative/JJ agreement/NN)
  (VP extending/VBG)
  (NP its/PRP$ contract/NN)
  (PP with/IN)
  (NP Boeing/NNP Co./NNP)
  (VP to/TO provide/VB)
  (NP structural/JJ parts/NNS)
  (PP for/IN)
  (NP Boeing/NNP)
  (NP 's/POS 747/CD jetliners/NNS)
  ./.)

参照格式:

(Chunk Word/POS)

  • 代码
from nltk.corpus import conll2000

# 加载数据
train_data = conll2000.chunked_sents('train.txt')
test_data = conll2000.chunked_sents('test.txt')

print(train_data[0])

3.4 三个语料的词频统计

top 5 tags

  • Brown

[(‘NN’, 152470), (‘IN’, 120557), (‘AT’, 97959), (‘JJ’, 64028), (‘.’, 60638)]

  • Treebank top

[(‘NN’, 13166), (‘IN’, 9857), (‘NNP’, 9410), (‘DT’, 8165), (‘-NONE-’, 6592)]

  • CoNLL-2000

[(‘NN’, 36789), (‘IN’, 27835), (‘NNP’, 24690), (‘DT’, 22355), (‘NNS’, 16653)]

4.教程代码

整个教程用到的代码放在该文件夹:

  • https://github.com/disanda/d_code/tree/master/4.nltk

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2375416.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

uni-app学习笔记五-vue3响应式基础

一.使用ref定义响应式变量 在组合式 API 中&#xff0c;推荐使用 ref() 函数来声明响应式状态&#xff0c;ref() 接收参数&#xff0c;并将其包裹在一个带有 .value 属性的 ref 对象中返回 示例代码&#xff1a; <template> <view>{{ num1 }}</view><vi…

ElasticSeach快速上手笔记-入门篇

由来 Elasticsearch 是一个基于 Apache Lucene 构建的分布式、高扩展、近实时的搜索与数据分析引擎&#xff0c;能够高效处理结构化和非结构化数据的全文检索及复杂分析 搜索&#xff0c;即用户在平台如百度进行输入关键词&#xff0c;由后端给出搜索结果数据进行返回&#x…

《ffplay 读线程与解码线程分析:从初始化到 seek 操作,对比视频与音频解码的差异》

1 read-thread 1.1 初始化部分 1.分配. avformat_alloc_context 创建上下⽂ ic avformat_alloc_context();if (!ic) {av_log(NULL, AV_LOG_FATAL, "Could not allocate context.\n");ret AVERROR(ENOMEM);goto fail;}2 ic->interrupt_callback.callback deco…

MySQL推荐书单:从入门到精通

给大家介绍一些 MySQL 从入门到精通的经典书单&#xff0c;可以基于不同学习阶段的需求进行选择。 入门 MySQL必知必会 这本书继承了《SQL必知必会》的优点&#xff0c;专门针对 MySQL 用户&#xff0c;没有过多阐述数据库基础理论&#xff0c;而是紧贴实战&#xff0c;直接从…

【Nacos】env NACOS_AUTH_TOKEN must be set with Base64 String.

【Nacos】env NACOS_AUTH_TOKEN must be set with Base64 String. 问题描述 env NACOS_AUTH_TOKEN must be set with Base64 String.原因分析 从错误日志中可以看出&#xff0c;Nacos 启动失败的原因是缺少必要的环境变量 NACOS_AUTH_TOKEN。 NACOS_AUTH_TOKEN: Nacos 用于生…

秋招准备——2.跨时钟相关

格雷码异步FIFO跨时钟域处理 格雷码 一、格雷码规律 相邻性&#xff1a;相邻两个数的格雷码只有一位不同&#xff0c;例如&#xff1a; 0000 → 0001&#xff08;仅最低位变化&#xff09;0001 → 0011&#xff08;仅次低位变化&#xff09;0011 → 0010&#xff08;仅最低位…

激光打印机常见打印故障简单处理意见

一、 问题描述&#xff1a; 给打印机更换新的硒鼓时拉开硒鼓封条时有微量碳粉带出&#xff1b; 原因&#xff1a; 出厂打印测试时&#xff0c;可能会有微量碳粉在磁辊上或者磁辊仓&#xff1b; 解决方法&#xff1a; 擦干净即可正常使用&#xff1b; 二、 问题描述&…

【2025最新】Windows系统装VSCode搭建C/C++开发环境(附带所有安装包)

文章目录 为什么选择VSCode作为C/C开发工具&#xff1f;一、VSCode安装过程&#xff08;超简单&#xff01;&#xff09;二、VSCode中文界面设置&#xff08;再也不用对着英文发愁&#xff01;&#xff09;三、安装C/C插件&#xff08;编程必备神器&#xff01;&#xff09;四、…

MYSQL 查询去除小数位后多余的0

MYSQL 查询去除小数位后多余的0 在MySQL中&#xff0c;有时候我们需要去除存储在数据库中的数字字段小数点后面多余的0。这种情况通常发生在处理金额或其他需要精确小数位的数据时。例如&#xff0c;数据库中存储的是decimal (18,6)类型的数据&#xff0c;但在页面展示时不希望…

基于GF域的多进制QC-LDPC误码率matlab仿真,译码采用EMS算法

目录 1.算法仿真效果 2.算法涉及理论知识概要 3.MATLAB核心程序 4.完整算法代码文件获得 1.算法仿真效果 matlab2022a仿真结果如下&#xff08;完整代码运行后无水印&#xff09;&#xff1a; 本课题实现的是四进制QC-LDPC 仿真操作步骤可参考程序配套的操作视频。 2.算…

Vitrualbox完美显示系统界面(只需三步)

目录 1.使用vitrualbox的增强功能&#xff1a;​编辑 2.安装增强功能&#xff08;安装完后要重启虚拟机&#xff09;&#xff1a; 3. 调整界面尺寸&#xff08;如果一个选项不行的话&#xff0c;就多试试其他不同的百分比&#xff09;&#xff1a; 先看看原来的&#xff0c;…

王炸组合!STL-VMD二次分解 + Informer-LSTM 并行预测模型

往期精彩内容&#xff1a; 单步预测-风速预测模型代码全家桶-CSDN博客 半天入门&#xff01;锂电池剩余寿命预测&#xff08;Python&#xff09;-CSDN博客 超强预测模型&#xff1a;二次分解-组合预测-CSDN博客 VMD CEEMDAN 二次分解&#xff0c;BiLSTM-Attention预测模型…

n8n 修改或者智能体用文档知识库创建pdf

以下是对 Nextcloud、OnlyOffice、Seafile、Etherpad、BookStack 和 Confluence 等本地部署文档协作工具的综合评测、对比分析和使用推荐&#xff0c;帮助您根据不同需求选择合适的解决方案。 &#x1f9f0; 工具功能对比 工具名称核心功能本地部署支持适用场景优势与劣势Next…

论坛系统(中-1)

软件开发 编写公共代码 定义状态码 对执⾏业务处理逻辑过程中可能出现的成功与失败状态做针对性描述(根据需求分析阶段可以遇见的问题提前做出定义)&#xff0c;⽤枚举定义状态码&#xff0c;先定义⼀部分&#xff0c;业务中遇到新的问题再添加 定义状态码如下 状态码类型描…

FPGA+ESP32 = GameBoy 是你的童年吗?

之前介绍的所有的复古游戏机都是基于Intel-Altera FPGA制作的&#xff0c;今天就带来一款基于AMD-Xilinx FPGA的复古掌上游戏机-Game Bub。 Game Bub是一款掌上游戏机&#xff0c;旨在畅玩 Game Boy、Game Boy Color 和 Game Boy Advance 游戏。与大多数现代掌上游戏机一样&…

3D迷宫探险:伪3D渲染与运动控制的数学重构

目录 3D迷宫探险:伪3D渲染与运动控制的数学重构引言第一章 伪3D渲染引擎1.1 射线投射原理1.2 纹理透视校正第二章 迷宫生成算法2.1 图论生成模型2.2 复杂度控制第三章 第一人称控制3.1 运动微分方程3.2 鼠标视角控制第四章 碰撞检测优化4.1 层级检测体系4.2 滑动响应算法第五章…

【金仓数据库征文】_金仓数据库在金融行业的两地三中心容灾架构实践

金仓数据库在金融行业的两地三中心容灾架构实践 &#x1f31f;嗨&#xff0c;我是LucianaiB&#xff01; &#x1f30d; 总有人间一两风&#xff0c;填我十万八千梦。 &#x1f680; 路漫漫其修远兮&#xff0c;吾将上下而求索。 引言 随着国家对信息技术应用创新&#xff0…

Python作业练习3

任务简述 字符田字格绘制 代码实现 def print_tianzige():for i in range(11):if i in [0, 5, 10]:print("" "-----" * 2)else:print("|" " |" * 2)print_tianzige() 结果展示

十五种光电器件综合对比——《器件手册--光电器件》

十五、光电器件 名称 原理 特点 应用 发光二极管&#xff08;LED&#xff09; 基于半导体材料的电致发光效应&#xff0c;当电流通过时&#xff0c;电子与空穴复合&#xff0c;释放出光子。 高效、节能、寿命长、响应速度快、体积小。 广泛用于指示灯、照明、显示&#…

【计算机视觉】OpenCV项目实战:基于face_recognition库的实时人脸识别系统深度解析

基于face_recognition库的实时人脸识别系统深度解析 1. 项目概述2. 技术原理与算法设计2.1 人脸检测模块2.2 特征编码2.3 相似度计算 3. 实战部署指南3.1 环境配置3.2 数据准备3.3 实时识别流程 4. 常见问题与解决方案4.1 dlib安装失败4.2 人脸检测性能差4.3 误识别率高 5. 关键…