컴퓨터 프로그램이 실행되면 어떻게됩니까?

Programming

컴퓨터 프로그램이 실행되면 어떻게됩니까?

procodes 2020. 5. 21. 21:21

컴퓨터 프로그램이 실행되면 어떻게됩니까?

나는 일반적인 이론을 알고 있지만 세부 사항에 적합하지 않습니다.

프로그램이 컴퓨터의 보조 메모리에 있다는 것을 알고 있습니다. 프로그램이 실행을 시작하면 완전히 RAM에 복사됩니다. 그런 다음 프로세서는 한 번에 몇 개의 명령 (버스 크기에 따라 다름)을 검색하여 레지스터에 넣고 실행합니다.

또한 컴퓨터 프로그램은 두 가지 종류의 메모리를 사용한다는 것을 알고 있습니다. 스택과 힙은 컴퓨터의 기본 메모리의 일부이기도합니다. 스택은 비 동적 메모리에 사용되며 동적 메모리에 대한 힙 (예 : newC ++ 의 연산자 와 관련된 모든 것 )

내가 이해할 수없는 것은이 두 가지가 어떻게 연결되는지입니다. 명령어 실행에 스택이 사용되는 시점은 언제입니까? RAM에서 스택, 레지스터로 명령이 전달됩니까?

실제로 시스템에 따라 다르지만 가상 메모리가있는 최신 OS 는 프로세스 이미지를로드하고 다음과 같은 메모리를 할당하는 경향이 있습니다.

+---------+
|  stack  |  function-local variables, return addresses, return values, etc.
|         |  often grows downward, commonly accessed via "push" and "pop" (but can be
|         |  accessed randomly, as well; disassemble a program to see)
+---------+
| shared  |  mapped shared libraries (C libraries, math libs, etc.)
|  libs   |
+---------+
|  hole   |  unused memory allocated between the heap and stack "chunks", spans the
|         |  difference between your max and min memory, minus the other totals
+---------+
|  heap   |  dynamic, random-access storage, allocated with 'malloc' and the like.
+---------+
|   bss   |  Uninitialized global variables; must be in read-write memory area
+---------+
|  data   |  data segment, for globals and static variables that are initialized
|         |  (can further be split up into read-only and read-write areas, with
|         |  read-only areas being stored elsewhere in ROM on some systems)
+---------+
|  text   |  program code, this is the actual executable code that is running.
+---------+

이것은 많은 일반적인 가상 메모리 시스템의 일반적인 프로세스 주소 공간입니다. "구멍"은 전체 메모리의 크기에서 다른 모든 영역이 차지하는 공간을 뺀 것입니다. 이것은 힙이 커질 수있는 많은 공간을 제공합니다. 이것은 "가상"이기도합니다. 즉 , 변환 테이블을 통해 실제 메모리에 매핑되며 실제로 실제 메모리의 어느 위치 에나 저장 될 수 있습니다. 이는 한 프로세스가 다른 프로세스의 메모리에 액세스하지 못하도록 보호하고 각 프로세스가 완전한 시스템에서 실행되고 있다고 생각하게하기 위해 수행됩니다.

예를 들어, 스택 및 힙의 위치는 일부 시스템에서 다른 순서 일 수 있습니다 ( Win32에 대한 자세한 내용은 아래 Billy O'Neal의 답변 참조).

다른 시스템은 매우 다를 수 있습니다 . 예를 들어 DOS는 실제 모드 에서 실행되었으며 프로그램을 실행할 때의 메모리 할당은 크게 다르게 보입니다.

+-----------+ top of memory
| extended  | above the high memory area, and up to your total memory; needed drivers to
|           | be able to access it.
+-----------+ 0x110000
|  high     | just over 1MB->1MB+64KB, used by 286s and above.
+-----------+ 0x100000
|  upper    | upper memory area, from 640kb->1MB, had mapped memory for video devices, the
|           | DOS "transient" area, etc. some was often free, and could be used for drivers
+-----------+ 0xA0000
| USER PROC | user process address space, from the end of DOS up to 640KB
+-----------+
|command.com| DOS command interpreter
+-----------+ 
|    DOS    | DOS permanent area, kept as small as possible, provided routines for display,
|  kernel   | *basic* hardware access, etc.
+-----------+ 0x600
| BIOS data | BIOS data area, contained simple hardware descriptions, etc.
+-----------+ 0x400
| interrupt | the interrupt vector table, starting from 0 and going to 1k, contained 
|  vector   | the addresses of routines called when interrupts occurred.  e.g.
|  table    | interrupt 0x21 checked the address at 0x21*4 and far-jumped to that 
|           | location to service the interrupt.
+-----------+ 0x0

DOS에서 보호없이 운영 체제 메모리에 직접 액세스 할 수 있다는 것을 알 수 있습니다. 이는 사용자 공간 프로그램이 일반적으로 원하는 것을 직접 액세스하거나 덮어 쓸 수 있음을 의미합니다.

그러나 프로세스 주소 공간에서 프로그램은 비슷해 보였으며 코드 세그먼트, 데이터 세그먼트, 힙, 스택 세그먼트 등으로 설명되었으며 약간 다르게 매핑되었습니다. 그러나 대부분의 일반 지역은 여전히있었습니다.

프로그램과 필요한 공유 라이브러리를 메모리에로드하고 프로그램의 일부를 올바른 영역에 배포하면 OS는 기본 메소드가있는 곳마다 프로세스를 실행하기 시작하고 프로그램이 거기에서 인계하여 필요한 경우 시스템 호출을 수행합니다 그것들이 필요합니다.

임베디드 시스템은 스택리스 시스템, 하버드 아키텍처 시스템 (코드 및 데이터가 별도의 물리적 메모리에 보관 됨), 실제로 BSS를 읽기 전용 메모리에 유지하는 시스템 (처음에는 프로그래머) 등입니다. 그러나 이것은 일반적인 요지입니다.

당신은 말했다 :

또한 컴퓨터 프로그램은 두 가지 종류의 메모리를 사용한다는 것을 알고 있습니다. 스택과 힙은 컴퓨터의 기본 메모리의 일부이기도합니다.

"Stack" and "heap" are just abstract concepts, rather than (necessarily) physically distinct "kinds" of memory.

A stack is merely a last-in, first-out data structure. In the x86 architecture, it can actually be addressed randomly by using an offset from the end, but the most common functions are PUSH and POP to add and remove items from it, respectively. It is commonly used for function-local variables (so-called "automatic storage"), function arguments, return addresses, etc. (more below)

A "heap" is just a nickname for a chunk of memory that can be allocated on demand, and is addressed randomly (meaning, you can access any location in it directly). It is commonly used for data structures that you allocate at runtime (in C++, using new and delete, and malloc and friends in C, etc).

The stack and heap, on the x86 architecture, both physically reside in your system memory (RAM), and are mapped through virtual memory allocation into the process address space as described above.

The registers (still on x86), physically reside inside the processor (as opposed to RAM), and are loaded by the processor, from the TEXT area (and can also be loaded from elsewhere in memory or other places depending on the CPU instructions that are actually executed). They are essentially just very small, very fast on-chip memory locations that are used for a number of different purposes.

Register layout is highly dependent on the architecture (in fact, registers, the instruction set, and memory layout/design, are exactly what is meant by "architecture"), and so I won't expand upon it, but recommend you take an assembly language course to understand them better.

Your question:

At what point is the stack used for the execution of the instructions? Instructions go from the RAM, to the stack, to the registers?

The stack (in systems/languages that have and use them) is most often used like this:

int mul( int x, int y ) {
    return x * y;       // this stores the result of MULtiplying the two variables 
                        // from the stack into the return value address previously 
                        // allocated, then issues a RET, which resets the stack frame
                        // based on the arg list, and returns to the address set by
                        // the CALLer.
}

int main() {
    int x = 2, y = 3;   // these variables are stored on the stack
    mul( x, y );        // this pushes y onto the stack, then x, then a return address,
                        // allocates space on the stack for a return value, 
                        // then issues an assembly CALL instruction.
}

Write a simple program like this, and then compile it to assembly (gcc -S foo.c if you have access to GCC), and take a look. The assembly is pretty easy to follow. You can see that the stack is used for function local variables, and for calling functions, storing their arguments and return values. This is also why when you do something like:

f( g( h( i ) ) );

All of these get called in turn. It's literally building up a stack of function calls and their arguments, executing them, and then popping them off as it winds back down (or up ;). However, as mentioned above, the stack (on x86) actually resides in your process memory space (in virtual memory), and so it can be manipulated directly; it's not a separate step during execution (or at least is orthogonal to the process).

FYI, the above is the C calling convention, also used by C++. Other languages/systems may push arguments onto the stack in a different order, and some languages/platforms don't even use stacks, and go about it in different ways.

Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. ~~They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there.~~ [This was incorrect. See Ben Voigt's correction below.]

Sdaz has gotten a remarkable number of upvotes in a very short time, but sadly is perpetuating a misconception about how instructions move through the CPU.

The question asked:

Instructions go from the RAM, to the stack, to the registers?

Sdaz said:

Also note, these aren't actual lines of C code executing. The compiler has converted them into machine language instructions in your executable. They are then (generally) copied from the TEXT area into the CPU pipeline, then into the CPU registers, and executed from there.

But this is wrong. Except for the special case of self-modifying code, instructions never enter the datapath. And they are not, cannot be, executed from the datapath.

The x86 CPU registers are:

General registers EAX EBX ECX EDX
Segment registers CS DS ES FS GS SS
Index and pointers ESI EDI EBP EIP ESP
Indicator EFLAGS

There are also some floating-point and SIMD registers, but for the purposes of this discussion we'll classify those as part of the coprocessor and not the CPU. The memory-management unit inside the CPU also has some registers of its own, we'll again treat that as a separate processing unit.

None of these registers are used for executable code. EIP contains the address of the executing instruction, not the instruction itself.

Instructions go through a completely different path in the CPU from data (Harvard architecture). All current machines are Harvard architecture inside the CPU. Most these days are also Harvard architecture in the cache. x86 (your common desktop machine) are Von Neumann architecture in the main memory, meaning data and code are intermingled in RAM. That's beside the point, since we're talking about what happens inside the CPU.

The classic sequence taught in computer architecture is fetch-decode-execute. The memory controller looks up the instruction stored at the address EIP. The bits of the instruction go through some combinational logic to create all the control signals for the different multiplexers in the processor. And after some cycles, the arithmetic logic unit arrives at a result, which is clocked into the destination. Then the next instruction is fetched.

On a modern processor, things work a little differently. Each incoming instruction is translated into a whole series of microcode instructions. This enable pipelining, because the resources used by the first microinstruction aren't needed later, so they can begin working on the first microinstruction from the next instruction.

To top it off, terminology is slightly confused because register is an electrical engineering term for a collection of D-flipflops. And instructions (or especially microinstructions) may very well be stored temporarily in such a collection of D-flipflops. But this is not what is meant when a computer scientist or software engineer or run-of-the-mill developer uses the term register. They mean the datapath registers as listed above, and these are not used for transporting code.

The names and number of datapath registers vary for other CPU architectures, such as ARM, MIPS, Alpha, PowerPC, but all of them execute instructions without passing them through the ALU.

The exact layout of the memory while a process is executing is completely dependent on the platform which you're using. Consider the following test program:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    int stackValue = 0;
    int *addressOnStack = &stackValue;
    int *addressOnHeap = malloc(sizeof(int));
    if (addressOnStack > addressOnHeap)
    {
        puts("The stack is above the heap.");
    }
    else
    {
        puts("The heap is above the stack.");
    }
}

On Windows NT (and it's children), this program is going to generally produce:

The heap is above the stack

On POSIX boxes, it's going to say:

The stack is above the heap

The UNIX memory model is quite well explained here by @Sdaz MacSkibbons, so I won't reiterate that here. But that is not the only memory model. The reason POSIX requires this model is the sbrk system call. Basically, on a POSIX box, to get more memory, a process merely tells the Kernel to move the divider between the "hole" and the "heap" further into the "hole" region. There is no way to return memory to the operating system, and the operating system itself does not manage your heap. Your C runtime library has to provide that (via malloc).

This also has implications for the kind of code actually used in POSIX binaries. POSIX boxes (almost universally) use the ELF file format. In this format, the operating system is responsible for communications between libraries in different ELF files. Therefore, all the libraries use position-independent code (That is, the code itself can be loaded into different memory addresses and still operate), and all calls between libraries are passed through a lookup table to find out where control needs to jump for cross library function calls. This adds some overhead and can be exploited if one of the libraries changes the lookup table.

Windows' memory model is different because the kind of code it uses is different. Windows uses the PE file format, which leaves the code in position-dependent format. That is, the code depends on where exactly in virtual memory the code is loaded. There is a flag in the PE spec which tells the OS where exactly in memory the library or executable would like to be mapped when your program runs. If a program or library cannot be loaded at it's preferred address, the Windows loader must rebase the library/executable -- basically, it moves the position-dependent code to point at the new positions -- which doesn't require lookup tables and cannot be exploited because there's no lookup table to overwrite. Unfortunately, this requires very complicated implementation in the Windows loader, and does have considerable startup time overhead if an image needs to be rebased. Large commercial software packages often modify their libraries to start purposely at different addresses to avoid rebasing; windows itself does this with it's own libraries (e.g. ntdll.dll, kernel32.dll, psapi.dll, etc. -- all have different start addresses by default)

On Windows, virtual memory is obtained from the system via a call to VirtualAlloc, and it is returned to the system via VirtualFree (Okay, technically VirtualAlloc farms out to NtAllocateVirtualMemory, but that's an implementation detail) (Contrast this to POSIX, where memory cannot be reclaimed). This process is slow (and IIRC, requires that you allocate in physical page sized chunks; typically 4kb or more). Windows also provides it's own heap functions (HeapAlloc, HeapFree, etc.) as part of a library known as RtlHeap, which is included as a part of Windows itself, upon which the C runtime (that is, malloc and friends) is typically implemented.

Windows also has quite a few legacy memory allocation APIs from the days when it had to deal with old 80386s, and these functions are now built on top of RtlHeap. For more information about the various APIs that control memory management in Windows, see this MSDN article: http://msdn.microsoft.com/en-us/library/ms810627 .

Note also that this means on Windows a single process an (and usually does) have more than one heap. (Typically, each shared library creates it's own heap.)

(Most of this information comes from "Secure Coding in C and C++" by Robert Seacord)

The stack

In X86 architercture the CPU executes operations with registers. The stack is only used for convenience reasons. You can save the content of your registers to stack before calling a subroutine or a system function and then load them back to continue your operation where you left. (You could to it manually without the stack, but it is a frequently used function so it has CPU support). But you can do pretty much anything without the stack in a PC.

For example an integer multiplication:

MUL BX

Multiplies AX register with BX register. (The result will be in DX and AX, DX containing the higher bits).

Stack based machines (like JAVA VM) use the stack for their basic operations. The above multiplication:

DMUL

This pops two values from the top of the stack and multiplies tem, then pushes the result back to the stack. Stack is essential for this kind of machines.

Some higher level programming languages (like C and Pascal) use this later method for passing parameters to functions: the parameters are pushed to the stack in left to right order and popped by the function body and the return values are pushed back. (This is a choice that the compiler manufacturers make and kind of abuses the way the X86 uses the stack).

The heap

The heap is an other concept that exists only in the realm of the compilers. It takes the pain of handling the memory behind your variables away, but it is not a function of the CPU or the OS, it is just a choice of housekeeping the memory block wich is given out by the OS. You could do this manyually if you want.

Accessing system resources

The operating system has a public interface how you can access its functions. In DOS parameters are passed in registers of the CPU. Windows uses the stack for passing parameters for OS functions (the Windows API).

참고URL : https://stackoverflow.com/questions/5162580/what-happens-when-a-computer-program-runs

저작자표시 (새창열림)

'Programming' 카테고리의 다른 글

Objective-C 델리게이트가 일반적으로 유지 대신 속성 지정을받는 이유는 무엇입니까? (0)	2020.05.21
현재 분기를 다른 분기로 병합하는 방법 (0)	2020.05.21
Spring MVC에서 ApplicationContext와 WebApplicationContext의 차이점은 무엇입니까? (0)	2020.05.21
치명적 : 현재 지점 마스터에 업스트림 지점이 없습니다. (0)	2020.05.21
새로운 .gitignore 파일과 git repo 재 동기화 (0)	2020.05.20

현재글컴퓨터 프로그램이 실행되면 어떻게됩니까?

procodes

컴퓨터 프로그램이 실행되면 어떻게됩니까?

컴퓨터 프로그램이 실행되면 어떻게됩니까?

'Programming' 카테고리의 다른 글

'Programming'의 다른글

티스토리툴바

컴퓨터 프로그램이 실행되면 어떻게됩니까?

컴퓨터 프로그램이 실행되면 어떻게됩니까?

'Programming' 카테고리의 다른 글

'Programming'의 다른글

관련글

티스토리툴바