このエントリーをはてなブックマークに追加
ID 67516
FullText URL
fulltext.pdf 1.18 MB
Author
Naing, Inzali Department of Information and Communication Systems, Okayama University
Aung, Soe Thandar Department of Information and Communication Systems, Okayama University
Wai, Khaing Hsu Department of Information and Communication Systems, Okayama University
Funabiki, Nobuo Department of Information and Communication Systems, Okayama University Kaken ID publons researchmap
Abstract
Collecting reference papers from the Internet is one of the most important activities for progressing research and writing papers about their results. Unfortunately, the current process using Google Scholar may not be efficient, since a lot of paper files cannot be accessed directly by the user. Even if they are accessible, their effectiveness needs to be checked manually. In this paper, we propose a reference paper collection system using web scraping to automate paper collections from websites. This system can collect or monitor data from the Internet, which is considered as the environment, using Selenium, a popular web scraping software, as the sensor; this examines the similarity against the search target by comparing the keywords using the Bert model. The Bert model is a deep learning model for natural language processing (NLP) that can understand context by analyzing the relationships between words in a sentence bidirectionally. The Python Flask is adopted at the web application server, where Angular is used for data presentations. For the evaluation, we measured the performance, investigated the accuracy, and asked members of our laboratory to use the proposed method and provide their feedback. Their results confirm the method’s effectiveness.
Keywords
web scraping
Google Scholar
data collection
Bert
Selenium
flask framework
Angular
Published Date
2024-07-10
Publication Title
Electronics
Volume
volume13
Issue
issue14
Publisher
MDPI
Start Page
2700
ISSN
2079-9292
Content Type
Journal Article
language
English
OAI-PMH Set
岡山大学
Copyright Holders
© 2024 by the authors.
File Version
publisher
DOI
Web of Science KeyUT
Related Url
isVersionOf https://doi.org/10.3390/electronics13142700
License
https://creativecommons.org/licenses/by/4.0/
Citation
Naing, I.; Aung, S.T.; Wai, K.H.; Funabiki, N. A Reference Paper Collection System Using Web Scraping. Electronics 2024, 13, 2700. https://doi.org/10.3390/electronics13142700