Apache Druid Tutorial in Docker

6 minute read

๐Ÿ”—Druid Tutorial in Docker ์›๋ณธ ์ถœ์ฒ˜

์ตœ๋Œ€ํ•œ ๊ฐœ๋ฐœํ™˜๊ฒฝ์— ์ œ์•ฝ์‚ฌํ•ญ ์—†์ด ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ Docker Compose๋กœ ์‹คํ–‰ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.

Docker compose YAML

๐Ÿ”—Apache Druid 24.0.0, Docker compose YAML

version: "2.2"
volumes:
  metadata_data: {}
  middle_var: {}
  historical_var: {}
  broker_var: {}
  coordinator_var: {}
  router_var: {}
  druid_shared: {}
services:
  postgres:
    container_name: postgres
    image: postgres:latest
    volumes:
      - metadata_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=FoolishPassword
      - POSTGRES_USER=druid
      - POSTGRES_DB=druid
  # Need 3.5 or later for container nodes
  zookeeper:
    container_name: zookeeper
    image: zookeeper:3.5
    ports:
      - "2181:2181"
    environment:
      - ZOO_MY_ID=1
  coordinator:
    image: apache/druid:24.0.0
    container_name: coordinator
    volumes:
      - druid_shared:/opt/shared
      - coordinator_var:/opt/druid/var
    depends_on: 
      - zookeeper
      - postgres
    ports:
      - "8081:8081"
    command:
      - coordinator
    env_file:
      - environment
  broker:
    image: apache/druid:24.0.0
    container_name: broker
    volumes:
      - broker_var:/opt/druid/var
    depends_on: 
      - zookeeper
      - postgres
      - coordinator
    ports:
      - "8082:8082"
    command:
      - broker
    env_file:
      - environment
  historical:
    image: apache/druid:24.0.0
    container_name: historical
    volumes:
      - druid_shared:/opt/shared
      - historical_var:/opt/druid/var
    depends_on: 
      - zookeeper
      - postgres
      - coordinator
    ports:
      - "8083:8083"
    command:
      - historical
    env_file:
      - environment
  middlemanager:
    image: apache/druid:24.0.0
    container_name: middlemanager
    volumes:
      - druid_shared:/opt/shared
      - middle_var:/opt/druid/var
    depends_on: 
      - zookeeper
      - postgres
      - coordinator
    ports:
      - "8091:8091"
      - "8100-8105:8100-8105"
    command:
      - middleManager
    env_file:
      - environment
  router:
    image: apache/druid:24.0.0
    container_name: router
    volumes:
      - router_var:/opt/druid/var
    depends_on:
      - zookeeper
      - postgres
      - coordinator
    ports:
      - "8888:8888"
    command:
      - router
    env_file:
      - environment

environment

๐Ÿ”—24.0.0 Docker environment์—์„œ ์กฐ๊ธˆ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

# Java tuning
DRUID_XMX=16g
DRUID_XMS=16g
DRUID_MAXNEWSIZE=8g
DRUID_NEWSIZE=8g
DRUID_MAXDIRECTMEMORYSIZE=16g
DRUID_SINGLE_NODE_CONF=micro-quickstart

druid_emitter_logging_logLevel=debug
druid_emitter_http_recipientBaseUrl=http://druid_exporter:8080/druid
druid_emitter=http

druid_extensions_loadList=["druid-kafka-indexing-service", "druid-multi-stage-query", "druid-parquet-extensions", "druid-avro-extensions", "druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage"]

druid_zk_service_host=zookeeper

druid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=FoolishPassword

druid_coordinator_balancer_strategy=cachingCost

druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=512MiB

druid_storage_type=local
druid_storage_storageDirectory=/opt/shared/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logs

druid_processing_numThreads=2
druid_processing_numMergeBuffers=2

DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration

์—ฌ๊ธฐ์„œ ์กฐ๊ธˆ ํฌ์ธํŠธ๋ฅผ ์งš์–ด๋ณด์ž๋ฉด,

JVM heap memory ์‚ฌ์ด์ฆˆ์— ๋Œ€ํ•œ ์„ค์ •๋“ค

# Java tuning
DRUID_XMX=16g
DRUID_XMS=16g
DRUID_MAXNEWSIZE=8g
DRUID_NEWSIZE=8g
DRUID_MAXDIRECTMEMORYSIZE=16g
DRUID_SINGLE_NODE_CONF=micro-quickstart

Druid Extensions

  • Druid์—์„œ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ๋Šฅ๋“ค์„ Extension์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”—Druid extensions
druid_extensions_loadList=["druid-kafka-indexing-service", "druid-multi-stage-query", "druid-parquet-extensions", "druid-avro-extensions", "druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage"]

Kyungseon Park

์ €๋Š” ์—ฌ์˜๋„ ์†Œ์žฌ์˜ AIํ•€ํ…Œํฌ ์Šคํƒ€ํŠธ์—… Aizen์—์„œ ML Engineer๋กœ ๊ทผ๋ฌดํ•˜๊ณ  ์žˆ๊ณ , ์ตœ๊ทผ์—๋Š” MLOps์— ๊ด€์‹ฌ์„ ๊ฐ–๊ณ  ๊ณต๋ถ€ํ•˜๊ณ ์žˆ์Šต๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ œ ์ •๋ณด๊ฐ€ ๋” ํ•„์š”ํ•˜๋‹ค๋ฉด ํ”„๋กœํ•„ ์‚ฌ์ง„์„ ํด๋ฆญํ•˜์„ธ์š”.