왜 (variable1 % variable2 == 0) 비효율적입니까?
나는 자바를 처음 접했고 어젯밤에 코드를 실행하고 있었고, 이것은 실제로 나를 귀찮게했다. 나는 for 루프에서 모든 X 출력을 표시하는 간단한 프로그램을 작성하고 있었고 모듈러스를 variable % variable
vs variable % 5000
또는 기타 로 사용했을 때 성능이 크게 떨어졌습니다 . 누군가 이것이 왜 이것이 원인인지 설명 할 수 있습니까? 그래서 나는 더 나아질 수 있습니다 ...
여기에 "효율적인"코드가 있습니다 (구문이 틀리면 죄송합니다. 컴퓨터에 코드가 없습니다.)
long startNum = 0;
long stopNum = 1000000000L;
for (long i = startNum; i <= stopNum; i++){
if (i % 50000 == 0) {
System.out.println(i);
}
}
다음은 "비효율적 인 코드"입니다
long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;
for (long i = startNum; i <= stopNum; i++){
if (i % progressCheck == 0) {
System.out.println(i);
}
}
차이를 측정하는 날짜 변수가 있었고 일단 충분히 길어지면 첫 번째는 50ms가 걸리고 다른 하나는 12 초 정도 걸렸습니다. PC가 내 것보다 더 효율적이거나 그렇지 않은 경우 를 늘리 stopNum
거나 줄여야 할 수도 있습니다 progressCheck
.
웹 에서이 질문을 찾았지만 답을 찾을 수 없습니다. 아마 대답하지 않을 수도 있습니다.
편집 : 내 질문이 그렇게 인기가 있다고 기대하지 않았습니다. 모든 답변을 주셔서 감사합니다. 나는 매 반마다 벤치 마크를 수행했으며 비효율적 인 코드는 1/4 초 대 10 초가 걸리거나 걸리는 시간이 상당히 길었습니다. 그들이 println을 사용하고 있지만 둘 다 같은 양을하고 있기 때문에, 특히 불일치가 반복 가능하기 때문에 그것이 많이 기울어 질 것이라고는 상상할 수 없습니다. 답변은 Java를 처음 사용하기 때문에 투표가 지금 어떤 답변이 가장 적합한지를 결정하게 할 것입니다. 수요일까지 하나 골라 볼게요.
EDIT2 : 오늘 밤 또 다른 테스트를 할 것입니다. 여기서 모듈러스 대신 변수를 증가시키고 progressCheck에 도달하면 하나를 수행 한 다음 해당 변수를 0으로 재설정합니다.
편집 3.5 :
이 코드를 사용했는데 아래에 결과를 보여 드리겠습니다. 훌륭한 도움을 주셔서 감사합니다. 또한 long의 짧은 값을 0으로 비교하려고 시도했기 때문에 새로운 모든 검사는 "65536"번 반복되어 반복적으로 동일하게 발생합니다.
public class Main {
public static void main(String[] args) {
long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 65536;
final long finalProgressCheck = 50000;
long date;
// using a fixed value
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if (i % 65536 == 0) {
System.out.println(i);
}
}
long final1 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
//using a variable
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i);
}
}
long final2 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using a final declared variable
for (long i = startNum; i <= stopNum; i++) {
if (i % finalProgressCheck == 0) {
System.out.println(i);
}
}
long final3 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
// using increments to determine progressCheck
int increment = 0;
for (long i = startNum; i <= stopNum; i++) {
if (increment == 65536) {
System.out.println(i);
increment = 0;
}
increment++;
}
//using a short conversion
long final4 = System.currentTimeMillis() - date;
date = System.currentTimeMillis();
for (long i = startNum; i <= stopNum; i++) {
if ((short)i == 0) {
System.out.println(i);
}
}
long final5 = System.currentTimeMillis() - date;
System.out.println(
"\nfixed = " + final1 + " ms " + "\nvariable = " + final2 + " ms " + "\nfinal variable = " + final3 + " ms " + "\nincrement = " + final4 + " ms" + "\nShort Conversion = " + final5 + " ms");
}
}
결과 :
- 고정 = 874ms (일반적으로 약 1000ms이지만 2의 거듭 제곱으로 인해 더 빠름)
- 변수 = 8590ms
- 최종 변수 = 1944ms (50000 사용시 ~ 1000ms)
- 증분 = 1904ms
- 짧은 변환 = 679ms
분할이 부족하여 단기 전환이 "빠른"방식보다 23 % 빠릅니다. 이것은 흥미 롭습니다. 256 회마다 (또는 거기에 대해) 무언가를 보여 주거나 비교해야 할 경우,이를 수행하고
if ((byte)integer == 0) {'Perform progress check code here'}
ONE FINAL INTERESTING NOTE, using modulus on the "Final declared Variable" with 65536 (not a pretty number) was half the speed of (slower) than the fixed value. Where before it was benchmarking near the same speed.
You are measuring the OSR (on-stack replacement) stub.
OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.
OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.
A similar thing happens here, too. While "inefficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck
local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.
In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the %
expression also gets optimized by gcc -O0
, similar to here where it's optimized by the JITer even in an OSR stub.)
However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory (benchmarked using JMH):
@State(Scope.Benchmark)
public class Div {
@Benchmark
public void divConst(Blackhole blackhole) {
long startNum = 0;
long stopNum = 100000000L;
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000 == 0) {
blackhole.consume(i);
}
}
}
@Benchmark
public void divVar(Blackhole blackhole) {
long startNum = 0;
long stopNum = 100000000L;
long progressCheck = 50000;
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
blackhole.consume(i);
}
}
}
}
And the results:
# Benchmark: bench.Div.divConst
# Run progress: 0,00% complete, ETA 00:00:16
# Fork: 1 of 1
# Warmup Iteration 1: 126,967 ms/op
# Warmup Iteration 2: 105,660 ms/op
# Warmup Iteration 3: 106,205 ms/op
Iteration 1: 105,620 ms/op
Iteration 2: 105,789 ms/op
Iteration 3: 105,915 ms/op
Iteration 4: 105,629 ms/op
Iteration 5: 105,632 ms/op
# Benchmark: bench.Div.divVar
# Run progress: 50,00% complete, ETA 00:00:09
# Fork: 1 of 1
# Warmup Iteration 1: 844,708 ms/op <-- much slower!
# Warmup Iteration 2: 105,893 ms/op <-- as fast as divConst
# Warmup Iteration 3: 105,601 ms/op
Iteration 1: 105,570 ms/op
Iteration 2: 105,475 ms/op
Iteration 3: 105,702 ms/op
Iteration 4: 105,535 ms/op
Iteration 5: 105,766 ms/op
The very first iteration of divVar
is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.
In follow-up to @phuclv comment, I checked the code generated by JIT1, the results are as follows:
for variable % 5000
(division by constant):
mov rax,29f16b11c6d1e109h
imul rbx
mov r10,rbx
sar r10,3fh
sar rdx,0dh
sub rdx,r10
imul r10,rdx,0c350h ; <-- imul
mov r11,rbx
sub r11,r10
test r11,r11
jne 1d707ad14a0h
for variable % variable
:
mov rax,r14
mov rdx,8000000000000000h
cmp rax,rdx
jne 22ccce218edh
xor edx,edx
cmp rbx,0ffffffffffffffffh
je 22ccce218f2h
cqo
idiv rax,rbx ; <-- idiv
test rdx,rdx
jne 22ccce218c0h
Because division always takes longer than multiplication, the last code snippet is less performant.
Java version:
java version "11" 2018-09-25
Java(TM) SE Runtime Environment 18.9 (build 11+28)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11+28, mixed mode)
1 - VM options used: -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=print,src/java/Main.main
As others have noted, the general modulus operation requires a division to be done. In some cases, the division can be replaced (by the compiler) by a multiplication. But both can be slow compared to addition/subtraction. Hence, the best performance can be expected by something along these lines:
long progressCheck = 50000;
long counter = progressCheck;
for (long i = startNum; i <= stopNum; i++){
if (--counter == 0) {
System.out.println(i);
counter = progressCheck;
}
}
(As a minor optmiziation attempt we use a pre-decrement down-counter here because on many architectures comparing to 0
immediately after an arithmetic operation costs exactly 0 instructions/CPU cycles because the ALU's flags are already set appropriately by the preceeding operation. A decent optimizing compiler will, however, do that optimization automatically even if you write if (counter++ == 50000) { ... counter = 0; }
.)
Notice that often you don't really want/need modulus, because you know that your loop counter (i
) or whatever is only ever incremented by 1, and you really don't care about the actual remainder the modulus will give you, just see if the incrementing-by-one counter hits some value.
Another 'trick' is to use power-of-two values/limits, e.g. progressCheck = 1024;
. Modulus a power of two can be quickly calculated via bitwise and
, i.e. if ( (i & (1024-1)) == 0 ) {...}
. This should be pretty fast too, and may on some architectures outperform the explicit counter
above.
I am also surprised by seeing the performance of the above codes. It's all about the time taken by the compiler for executing the program as per the declared variable. In the second (inefficient) example:
for (long i = startNum; i <= stopNum; i++) {
if (i % progressCheck == 0) {
System.out.println(i)
}
}
You are performing the modulus operation between two variables. Here, compiler has to check the value of stopNum
and progressCheck
to go to the specific memory block located for these variables every time after each iteration because it is a variable and its value might be change.
That's why after each iteration compiler went to the memory location to check the latest value of the variables. Therefore at the compile time the compiler was not able to create efficient byte code.
In the first code example, you are performing modulus operator between a variable and a constant numeric value which is not going to change within execution and compiler no need to check the value of that numeric value from the memory location. That's why compiler was able to create efficient byte code. If you declare progressCheck
as a final
or as a final static
variable then at the time of run-time/compile-time compiler know that it's a final variable and its value not going to change then compiler replace the progressCheck
with 50000
in code:
for (long i = startNum; i <= stopNum; i++) {
if (i % 50000== 0) {
System.out.println(i)
}
}
Now you can see that this code also looks like the first (efficient) code example. The performance of first code and as we mentioned above both code will work efficiently. There will not be much difference in execution time of either code example.
참고URL : https://stackoverflow.com/questions/54405842/why-is-if-variable1-variable2-0-inefficient
'Programming' 카테고리의 다른 글
Node.js + Express.js 애플리케이션에 대한 오류 처리 원리? (0) | 2020.05.21 |
---|---|
비트 OR 0을 사용하여 숫자 바닥 (0) | 2020.05.21 |
수입 명세서 앞의 밑줄은 무엇을 의미합니까? (0) | 2020.05.21 |
불안정한 연결에서 큰 프로젝트를 위해 git clone을 완료하는 방법은 무엇입니까? (0) | 2020.05.21 |
Objective-C 델리게이트가 일반적으로 유지 대신 속성 지정을받는 이유는 무엇입니까? (0) | 2020.05.21 |