๐ ElasticSearch ๋ฐ nori ์ค์น
์ค์น๋ ์๋ ๋ธ๋ก๊ทธ๋ฅผ ์ฐธ๊ณ ํ์ต๋๋ค.
ElasticSearch ์ค์นํ๊ธฐ (์๋์ฐ 10) (tistory.com)
ElasticSearch ๋ก์ปฌ ์ค์นํ๊ธฐ — ์ฐฝ์ Area (tistory.com)
์๋์ฐ ์๋ผ์คํฑ์์น + ๋ ธ๋ฆฌ์ค์น(window elasticsearch nori) (tistory.com)
๐ก ํ ์คํธ
์ฑ๊ณต์ ์ผ๋ก ์ค์น๋ฅผ ๋ง์ณค๋ค๋ฉด ๋จผ์ ElasticSearch ๋ฐฐ์น ํ์ผ์ ์คํํ ๋ค,
์๋์ ๊ฐ์ ์์ฒญ์ผ๋ก nori ํํ์ ๋ถ์๊ธฐ๊น์ง ํ ์คํธ๊ฐ ๊ฐ๋ฅํ๋ค.
๐ก nori๋?
nori๋ ElasticSearch์์ ์ ๊ณตํ๋ ํ๊ธ ํํ์ ๋ถ์๊ธฐ์ด๋ค.
์ด์ ์๋ ElasticSearch๊ฐ ํ๊ธ ํํ์ ๋ถ์์ ์ง์ํ์ง ์์์
'์์ ํ๋ข' ํน์ '์๋ฆฌ๋' ๋ฑ์ ํํ์ ๋ถ์๊ธฐ๋ฅผ ์ฌ์ฉํด์ผ ํ์ง๋ง,
ElasticSearch 6.4 ๋ฒ์ ๋ถํฐ๋ ๊ธฐ๋ณธ์ผ๋ก nori ํ๊ธ ํํ์ ๋ถ์๊ธฐ๋ฅผ ์ ๊ณตํด์ฃผ๊ธฐ ์์ํ๋ค.
(ํํ์ ๋ถ์์ด๋ผ ํจ์, ์์ด๋ก ๋ณด๋ฉด ed, s์ ๊ฐ์ด ๋จ์ด์ ๋ถ์ฌ ์ธ ์ ์๋ ๋ฌธ์๋
์ด๋ฏธ๋ก ์ฐ์ด๋ ๋จ์ด ๋ฑ์ ๋ถ์ํ์ฌ ํค์๋ ๊ฒ์์ ๋๋ ๊ฒ์ด๋ค.)
nori์ ๊ฒฝ์ฐ mecab-ko-dic ์ฌ์ ์ ์ด์ฉํ์ง๋ง, ์ฌ์ ์ ์์ถํ์ฌ ์ฌ์ฉํ๋ฏ๋ก
๊ธฐ์กด ํํ์ ๋ถ์๊ธฐ์ ๋น๊ตํ์ฌ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ํจ์ฌ ์ ๊ฒ ์ฐ๊ณ ๋น ๋ฅธ ์ฑ๋ฅ์ ๋ณด์ธ๋ค.
๐ Bulk API ํ์ฉ
์งํ์ค์ธ ํ๋ก์ ํธ์์๋ ์ฌ๋ฌ ์ด์ ๋ก ELK๋ฅผ ์ฌ์ฉํ์ง ์๊ณ , Bulk API๋ฅผ ํ์ฉํ๊ธฐ๋ก ๊ฒฐ์ ํ๋ค.
ElasticSearch ์ ์ฉ ๋ฐฉ์ ํ์ โ
์ค๋์ ์ฐ๋ฆฌ ํ๋ก์ ํธ์์ Bulk API๋ฅผ ํ์ฉํ ๋ฐฉ๋ฒ์ ๋ํด ์ ๋ฆฌํด๋ณด๋ ค ํ๋ค.
๐ก MySQL ๋ฐ์ดํฐ ๋ณํ
๊ธฐ์กด์ ์ฌ์ฉํ๋ MySQL์ ์ ์ฅ๋ ๋ฐ์ดํฐ๋ฅผ Bulk ํ์์ผ๋ก ๋ณํํ์ฌ
ElasticSearch์ ์ ์ฅํ๋ ๊ณผ์ ์ ์์ ํ๋ค.
1๏ธโฃ SQL ์ฟผ๋ฆฌ ์์ฑ
์๋์ ๊ฐ์ด ์ํ๋ ํ ์ด๋ธ์์ ์ํ๋ ์ปฌ๋ผ๋ง ์ ํํ์ฌ Bulk ํ์ ๋ณํ์ ์ํ ์ฟผ๋ฆฌ๋ฅผ ์์ฑํด์ฃผ์๋ค.
select
group_concat(concat("{'index':{'_id':", drink_id, "}}", '\n'),
concat("{'drink_id':",drink_id,",'drink_name':'", drink_name,"','drink_image':'", drink_image,"'}") separator '\n')
as json from sulnaeeum.drink;
2๏ธโฃ group_concat ๊ธธ์ด ์ ํ ๋ณ๊ฒฝ
group_concat์ ๊ฒฝ์ฐ ๊ธฐ๋ณธ ๊ธธ์ด ์ ํ์ด1024๋ก ์ค์ ๋์ด์์ผ๋ฏ๋ก
์ SQL ์ฟผ๋ฆฌ๋ก ๋ฐ๋ก select ํ ์, 1024๊ฐ ๋ฌธ์ ์ด์์ ๊ฒฐ๊ณผ๋ ์๋ ค์ ์กฐํ๋๋ ๋ฌธ์ ๊ฐ ๋ฐ์ํ๋ค.
(Lost Connection ์ค๋ฅ ๋ฉ์์ง ๋ฐ์)
๋ฐ๋ผ์ ์๋์ ๊ฐ์ด group_concat์ ๊ธธ์ด ์ ํ์ ๋๋ ค์ค ๋ค select๋ฅผ ์คํ์์ผ์ผ ํ๋ค.
# ํ์ฌ group_concat์ ์ ํ ๊ธธ์ด ํ์ธ
SHOW VARIABLES LIKE '%GROUP_CONCAT%';
# group_concat์ ๊ธธ์ด ์ ํ ๋๋ ค์ฃผ๊ธฐ
SET SESSION group_concat_max_len = 150000000000;
3๏ธโฃ copy field
์ถ๋ ฅ๋ select ๊ฒฐ๊ณผ๋ฅผ ์ฐํด๋ฆญํ์ฌ Copy Field ์ต์ ์ ์ ํํ๋ค.
4๏ธโฃ Bulk ๋ฐ์ดํฐ ํ์์ผ๋ก ๋ณํ
์์ ๋ฐฉ์์ผ๋ก ํ๋๋ฅผ ๋ณต์ฌํด์ค๋ฉด, ์๋์ ๊ฐ์ ํ์์ผ๋ก ๋ฐํ๋๋ ๊ฒ์ ํ์ธํ ์ ์๋ค.
'{''index'':{''_index'':''1''}}
{''drink_id'':1,''drink_name'':''์ ํผ์๋ํธํ'',''drink_image'':''https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/1.jpg''}
{''index'':{''_index'':''2''}}
{''drink_id'':2,''drink_name'':''์ ํผ์๋ ์๊ทธ๋ฆฌ์'',''drink_image'':''https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/2.jpg''}
{''index'':{''_index'':''3''}}
{''drink_id'':3,''drink_name'':''์ ํผ์๋ ์ ํ'',''drink_image'':''https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/3.jpg''}
{''index'':{''_index'':''4''}}
{''drink_id'':4,''drink_name'':''์ฌ์ ์์ฐ'',''drink_image'':''https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/4.jpg''}
{''index'':{''_index'':''5''}}
{''drink_id'':5,''drink_name'':''์์ด์ฑ ์๋ชฝ'',''drink_image'':''https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/5.jpg''}'
์ฌ๊ธฐ์ ์ถ๊ฐ๋ก ์๋ 3๊ฐ์ง ์์ ์ ํด์ฃผ๋ฉด Bulk ๋ฐ์ดํฐ ํ์์ผ๋ก ์๋ฒฝํ ๋ณํ๋๋ค.
- ๋งจ ์ฒซ ๋ฒ์งธ์ ๋ง์ง๋ง์ ์๋ ๋ฐ์ดํ ํ๋์ฉ์ ์ญ์
- VisualStudio์์ ๋ฐ์ดํ ๋ ๊ฐ๋ฅผ ๊ฒ์ํ ๋ค ์๋ฐ์ดํ๋ก replaceAll
- ๋งจ ๋ง์ง๋ง ํ์ ์ค ๋ฐ๊ฟ ํ ๋ฒ ์ถ๊ฐํด์ฃผ๊ธฐ
โ ์์ฑ๋ Bulk ๋ฐ์ดํฐ ํ์
{"index":{"_id":"1"}}
{"drink_id":1,"drink_name":"์ ํผ์๋ํธํ","drink_image":"https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/1.jpg"}
{"index":{"_id":"2"}}
{"drink_id":2,"drink_name":"์ ํผ์๋ ์๊ทธ๋ฆฌ์","drink_image":"https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/2.jpg"}
{"index":{"_id":"3"}}
{"drink_id":3,"drink_name":"์ ํผ์๋ ์ ํ","drink_image":"https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/3.jpg"}
{"index":{"_id":"4"}}
{"drink_id":4,"drink_name":"์ฌ์ ์์ฐ","drink_image":"https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/4.jpg"}
{"index":{"_id":"5"}}
{"drink_id":5,"drink_name":"์์ด์ฑ ์๋ชฝ","drink_image":"https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/5.jpg"}
๐ฅ ์ฃผ์์ฌํญ
์์์ ํ๋ ๊ฒ์ฒ๋ผ id ๊ฐ์ ์ง์ ํด์ฃผ์ง ์๊ณ ,
์ ์ด๋ฏธ์ง์ฒ๋ผ "_index" : "drink" ์ ๊ฐ์ด ํด์ฃผ๋ฉด ์๋์ผ๋ก index๊ฐ ์์ฑ๋์ด ์ ์ฅ๋๋ค.
๋๋ ์ฒ์์ ์ด์ ๊ฐ์ ๋ฐฉ์์ผ๋ก ๋ฐ์ดํฐ๋ฅผ ์ ์ฅํ๋๋ฐ,
์ด๋ ๊ฒ ํ ๊ฒฝ์ฐ ์ถํ nori๋ฅผ ์ ์ฉ์ด ์ ์๋๋ ์ค๋ฅ๊ฐ ๋ฐ์ํ๋ค.
๋ฐ๋ผ์ ๋ฏธ๋ฆฌ ์ธ๋ฑ์ค๋ฅผ ๋ง๋ค์ด์ฃผ๋ฉด์, nori๋ฅผ ์ ์ฉ์์ผ์ค ๋ค ๋ฐ์ดํฐ๋ฅผ ์ ์ฅํด์ฃผ๋ ๊ฒ์ ์ถ์ฒํ๋ค.
๐ก index ์์ฑ
์๋์ PUT ์์ฒญ๊ณผ ํจ๊ป Body์ ํด๋น JSON ๋ด์ฉ์ ๋ฃ์ด์ฃผ๋ฉด
nori tokenizer๋ฅผ ์ ์ฉํ index ์์ฑ์ด ๊ฐ๋ฅํ๋ค.
http://localhost:9200/drink
{
"settings": {
"analysis": {
"analyzer": {
"nori": {
"tokenizer": "nori_tokenizer"
}
}
}
},
"mappings": {
"properties": {
"drink_name": {
"type": "text",
"fields": {
"nori": {
"type": "text",
"analyzer": "nori"
}
}
}
}
}
}
๐ก Bulk API๋ฅผ ์ฌ์ฉํ ๋ฐ์ดํฐ ์ฝ์ , ์ญ์ , ์กฐํ, ๊ฒ์
์ธ๋ฑ์ค๋ฅผ ์์ฑํ์ผ๋ ์ด์ ๋ฐ์ดํฐ๋ฅผ ๋ฃ๊ณ API๋ฅผ ์ฌ์ฉํ๊ธฐ๋ง ํ๋ฉด ๋๋ค.
1๏ธโฃ ๋ฐ์ดํฐ ์ฝ์
์๋ ์์ฒญ์, ์์์ ๋ง๋ Bulk ๋ฐ์ดํฐ๋ฅผ Binary๋ก ๋ฃ์ด์ฃผ๋ฉด ๋๋ค.
curl -XPOST http://localhost:9200/drink/_bulk?pretty -H 'Content-Type: application/json' --data-binary @data.json
2๏ธโฃ ์กฐํํ์ฌ ๋ฐ์ดํฐ ํ์ธ
curl -XPOST 'localhost:9200/auto_complete/_search?pretty' -H 'Content-Type: application/json' -d'{
"query": {
"match_all": {}
}
}'
3๏ธโฃ index ์ญ์
curl -XDELETE http://localhost:9200/auto_complete?pretty -H 'Content-Type: application/json'
4๏ธโฃ nori tokenizer๋ฅผ ์ ์ฉํ ํค์๋ ๊ฒ์
์๋ GET ์์ฒญ๊ณผ ํจ๊ป JSON์ผ๋ก ๊ฒ์ ๋ด์ฉ์ ๋ณด๋ด์ฃผ๋ฉด ๋๋ค.
https://j8a707.p.ssafy.io/es/_search
{
"query": {
"match": {
"drink_name.nori": "ํธ๋์ด๋ง๊ฑธ๋ฆฌ"
}
}
}
๊ฒ์ ๊ฒฐ๊ณผ
{ "took": 6, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 114, "relation": "eq" }, "max_score": 7.305012, "hits": [ { "_index": "drink", "_type": "_doc", "_id": "54", "_score": 7.305012, "_source": { "drink_id": 54, "drink_name": "ํธ๋์ด ์ ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/54.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "139", "_score": 6.4983244, "_source": { "drink_id": 139, "drink_name": "๋งคํ๋ง๋ฆ ํธ๋์ด ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/139.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "126", "_score": 5.499996, "_source": { "drink_id": 126, "drink_name": "ํธ๋์ด๋ฐฐ๊ผฝ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/126.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "8", "_score": 2.0608451, "_source": { "drink_id": 8, "drink_name": "์ธ์๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/8.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "39", "_score": 2.0608451, "_source": { "drink_id": 39, "drink_name": "์ก๋ช ์ญ ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/39.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "43", "_score": 2.0608451, "_source": { "drink_id": 43, "drink_name": "๊นํฌ ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/43.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "55", "_score": 2.0608451, "_source": { "drink_id": 55, "drink_name": "๋๊ตฌ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/55.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "58", "_score": 2.0608451, "_source": { "drink_id": 58, "drink_name": "์ํ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/58.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "70", "_score": 2.0608451, "_source": { "drink_id": 70, "drink_name": "๋ฏธ์ ๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/70.jpg" } }, { "_index": "drink", "_type": "_doc", "_id": "79", "_score": 2.0608451, "_source": { "drink_id": 79, "drink_name": "DOK๋ง๊ฑธ๋ฆฌ", "drink_image": "https://sulnaeeum.s3.ap-northeast-2.amazonaws.com/drink/79.jpg" } } ] } }
์ธ๋ฑ์ค ์์ฑ ์ ์์ฑํ๋ Body ๋ด์ฉ์ ์ํด
ํด๋น ๋จ์ด์ ๊ฐ์ฅ ๊ฐ๊น์ด ๊ฒฐ๊ณผ๋ถํฐ ์ฐจ๋ก๋๋ก ๋ณด์ฌ์ฃผ๋ ๊ฒ์ ํ์ธํ ์ ์๋ค.
โ ElasticSearch์์ ๊ธฐ๋ณธ์ผ๋ก ์ ๊ณตํ๋ tokenizer๋ฅผ ์ฌ์ฉํ์ฌ ๊ฒ์ํ๋ ๋ฐฉ๋ฒ
https://j8a707.p.ssafy.io/es/_search?q=drink_name:๋ง๊ฑธ๋ฆฌ
์ค๋์ ๋ก์ปฌ ํ๊ฒฝ์์ ElasticSearch์ nori์ Bulk API๋ฅผ ํ์ฉํ๋ ๋ฐฉ๋ฒ์ ๋ํด ์ ๋ฆฌํ๋ค.
๋ค์ ๊ธ์์๋ EC2 ํ๊ฒฝ์์ ElasticSearch๋ฅผ ์ ์ฉํ๋ ๋ฐฉ์์ ๋ํ์ฌ ์ ๋ฆฌํด๋ณด๊ฒ ๋ค.
'1๏ธโฃ Web > ๊ธฐํ' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Web] Chrome Cross-Origin ์ ์ฑ ๋ณ๊ฒฝ์ ๋ฐ๋ฅธ iframe ํ์ฉ ๋ฐฉ์ (feat. PostMessage) (1) | 2023.10.14 |
---|---|
[Web] ElasticSearch ์ ์ฉ ๋ฐฉ์ ํ์ (0) | 2023.04.02 |
[Web] Anchor (a ํ๊ทธ) (0) | 2022.09.18 |
[Web] input ํ๊ทธ์ id์ name ์ฐจ์ด์ (0) | 2022.09.18 |
[Web] Form Tag์ ์ด๋ฒคํธ, ์ด๋ฒคํธ ํธ๋ค๋ฌ (2) | 2022.09.18 |
๋๊ธ