Greenlet Vs.

Programming

Greenlet Vs.

procodes 2020. 7. 3. 22:16

Greenlet Vs. 실

나는 gevents와 greenlet을 처음 사용합니다. 나는 그들과 함께 일하는 방법에 대한 좋은 문서를 찾았지만, 언제 어떻게 Greenlet을 사용해야하는지에 대한 정당성을 제시하지 못했습니다!

그들은 정말로 무엇을 잘합니까?
프록시 서버에서 사용하는 것이 좋습니까?
왜 스레드하지 않습니까?

내가 확실하지 않은 것은 그들이 기본적으로 공동 루틴 인 경우 그들이 우리에게 동시성을 제공 할 수있는 방법입니다.

Greenlet은 동시성을 제공하지만 병렬 처리 는 제공 하지 않습니다 . 동시성은 코드가 다른 코드와 독립적으로 실행될 수있는 경우입니다. 병렬 처리는 동시 코드를 동시에 실행하는 것입니다. 병렬 처리는 사용자 공간에서 수행해야 할 작업이 많을 때 특히 유용하며 일반적으로 CPU가 많은 작업입니다. 동시성은 문제를 해결하는 데 유용하며, 다른 부분을보다 쉽게 병렬로 예약하고 관리 할 수 있습니다.

Greenlet은 한 소켓과의 상호 작용이 다른 소켓과의 상호 작용과 독립적으로 발생할 수있는 네트워크 프로그래밍에서 실제로 빛납니다. 이것은 동시성의 전형적인 예입니다. 각 그린 릿은 자체 컨텍스트에서 실행되므로 스레딩없이 동기식 API를 계속 사용할 수 있습니다. 스레드는 가상 메모리 및 커널 오버 헤드 측면에서 매우 비싸기 때문에 스레드로 달성 할 수있는 동시성이 훨씬 적기 때문에 좋습니다. 또한 Python의 스레딩은 GIL로 인해 평소보다 비싸고 제한적입니다. 동시성에 대한 대안은 일반적으로 모든 코드가 동일한 실행 컨텍스트를 공유하고 이벤트 핸들러를 등록하는 Twisted, libevent, libuv, node.js 등과 같은 프로젝트입니다.

요청 처리가 독립적으로 실행될 수 있고 그렇게 작성되어야하므로 프록시 작성에 그린 렛 (gevent를 통한 적절한 네트워킹 지원 포함)을 사용하는 것이 좋습니다.

Greenlet은 내가 이전에 제공 한 이유로 동시성을 제공합니다. 동시성은 병렬 처리가 아닙니다. gevent와 같은 프로젝트는 일반적으로 현재 스레드를 차단하는 호출에서 이벤트 등록을 숨기고 일정을 수행함으로써 비동기 API를 변경하지 않고도 시스템에 훨씬 적은 비용으로이 동시성을 노출합니다.

@Max의 대답을 취하고 스케일링과 관련성을 추가하면 차이점을 알 수 있습니다. 다음과 같이 채워지도록 URL을 변경하여이를 달성했습니다.

URLS_base = ['www.google.com', 'www.example.com', 'www.python.org', 'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']
URLS = []
for _ in range(10000):
    for url in URLS_base:
        URLS.append(url)

500을 갖기 전에 멀티 프로세스 버전을 삭제해야했습니다. 그러나 10,000 회 반복 :

Using gevent it took: 3.756914
-----------
Using multi-threading it took: 15.797028

따라서 gevent를 사용하여 I / O에 큰 차이가 있음을 알 수 있습니다.

이것은 분석하기에 충분히 흥미 롭습니다. 다음은 그린 릿과 멀티 프로세싱 풀 및 멀티 스레딩의 성능을 비교하는 코드입니다.

import gevent
from gevent import socket as gsock
import socket as sock
from multiprocessing import Pool
from threading import Thread
from datetime import datetime

class IpGetter(Thread):
    def __init__(self, domain):
        Thread.__init__(self)
        self.domain = domain
    def run(self):
        self.ip = sock.gethostbyname(self.domain)

if __name__ == "__main__":
    URLS = ['www.google.com', 'www.example.com', 'www.python.org', 'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']
    t1 = datetime.now()
    jobs = [gevent.spawn(gsock.gethostbyname, url) for url in URLS]
    gevent.joinall(jobs, timeout=2)
    t2 = datetime.now()
    print "Using gevent it took: %s" % (t2-t1).total_seconds()
    print "-----------"
    t1 = datetime.now()
    pool = Pool(len(URLS))
    results = pool.map(sock.gethostbyname, URLS)
    t2 = datetime.now()
    pool.close()
    print "Using multiprocessing it took: %s" % (t2-t1).total_seconds()
    print "-----------"
    t1 = datetime.now()
    threads = []
    for url in URLS:
        t = IpGetter(url)
        t.start()
        threads.append(t)
    for t in threads:
        t.join()
    t2 = datetime.now()
    print "Using multi-threading it took: %s" % (t2-t1).total_seconds()

결과는 다음과 같습니다.

Using gevent it took: 0.083758
-----------
Using multiprocessing it took: 0.023633
-----------
Using multi-threading it took: 0.008327

I think that greenlet claims that it is not bound by GIL unlike the multithreading library. Moreover, Greenlet doc says that it is meant for network operations. For a network intensive operation, thread-switching is fine and you can see that the multithreading approach is pretty fast. Also it's always prefeerable to use python's official libraries; I tried installing greenlet on windows and encountered a dll dependency problem so I ran this test on a linux vm. Alway try to write a code with the hope that it runs on any machine.

Correcting for @TemporalBeing 's answer above, greenlets are not "faster" than threads and it is an incorrect programming technique to spawn 60000 threads to solve a concurrency problem, a small pool of threads is instead appropriate. Here is a more reasonable comparison (from my reddit post in response to people citing this SO post).

import gevent
from gevent import socket as gsock
import socket as sock
import threading
from datetime import datetime


def timeit(fn, URLS):
    t1 = datetime.now()
    fn()
    t2 = datetime.now()
    print(
        "%s / %d hostnames, %s seconds" % (
            fn.__name__,
            len(URLS),
            (t2 - t1).total_seconds()
        )
    )


def run_gevent_without_a_timeout():
    ip_numbers = []

    def greenlet(domain_name):
        ip_numbers.append(gsock.gethostbyname(domain_name))

    jobs = [gevent.spawn(greenlet, domain_name) for domain_name in URLS]
    gevent.joinall(jobs)
    assert len(ip_numbers) == len(URLS)


def run_threads_correctly():
    ip_numbers = []

    def process():
        while queue:
            try:
                domain_name = queue.pop()
            except IndexError:
                pass
            else:
                ip_numbers.append(sock.gethostbyname(domain_name))

    threads = [threading.Thread(target=process) for i in range(50)]

    queue = list(URLS)
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    assert len(ip_numbers) == len(URLS)

URLS_base = ['www.google.com', 'www.example.com', 'www.python.org',
             'www.yahoo.com', 'www.ubc.ca', 'www.wikipedia.org']

for NUM in (5, 50, 500, 5000, 10000):
    URLS = []

    for _ in range(NUM):
        for url in URLS_base:
            URLS.append(url)

    print("--------------------")
    timeit(run_gevent_without_a_timeout, URLS)
    timeit(run_threads_correctly, URLS)

Here are some results:

--------------------
run_gevent_without_a_timeout / 30 hostnames, 0.044888 seconds
run_threads_correctly / 30 hostnames, 0.019389 seconds
--------------------
run_gevent_without_a_timeout / 300 hostnames, 0.186045 seconds
run_threads_correctly / 300 hostnames, 0.153808 seconds
--------------------
run_gevent_without_a_timeout / 3000 hostnames, 1.834089 seconds
run_threads_correctly / 3000 hostnames, 1.569523 seconds
--------------------
run_gevent_without_a_timeout / 30000 hostnames, 19.030259 seconds
run_threads_correctly / 30000 hostnames, 15.163603 seconds
--------------------
run_gevent_without_a_timeout / 60000 hostnames, 35.770358 seconds
run_threads_correctly / 60000 hostnames, 29.864083 seconds

the misunderstanding everyone has about non-blocking IO with Python is the belief that the Python interpreter can attend to the work of retrieving results from sockets at a large scale faster than the network connections themselves can return IO. While this is certainly true in some cases, it is not true nearly as often as people think, because the Python interpreter is really, really slow. In my blog post here, I illustrate some graphical profiles that show that for even very simple things, if you are dealing with crisp and fast network access to things like databases or DNS servers, those services can come back a lot faster than the Python code can attend to many thousands of those connections.

참고URL : https://stackoverflow.com/questions/15556718/greenlet-vs-threads

저작자표시 (새창열림)

'Programming' 카테고리의 다른 글

루트 응용 프로그램을 변경하는 방법? (0)	2020.07.03
올바른 개수의 인수 확인 (0)	2020.07.03
이메일을 보내는 앱을 개발하고 테스트하는 방법 (테스트 데이터로 다른 사람의 사서함을 채우지 않고)? (0)	2020.07.03
HttpURLConnection에서 PUT, DELETE HTTP 요청을 보내는 방법은 무엇입니까? (0)	2020.07.03
CheckBoxFor가 추가 입력 태그를 렌더링하는 이유는 무엇이며 FormCollection을 사용하여 값을 얻는 방법은 무엇입니까? (0)	2020.07.03

현재글Greenlet Vs.

procodes

Greenlet Vs.

Greenlet Vs. 실

'Programming' 카테고리의 다른 글

'Programming'의 다른글

티스토리툴바

Greenlet Vs.

Greenlet Vs. 실

'Programming' 카테고리의 다른 글

'Programming'의 다른글

관련글

티스토리툴바