Programming

인공 신경망에 대한 입력을 정규화해야하는 이유는 무엇입니까?

procodes 2020. 7. 10. 21:14
반응형

인공 신경망에 대한 입력을 정규화해야하는 이유는 무엇입니까?


신경망 이론에 관한 주요 질문입니다.

신경망의 입력을 정규화해야하는 이유는 무엇입니까?

때로는 입력 값이 숫자가 아닌 경우 특정 변환을 수행해야하지만 숫자 입력이있는 경우가 있음을 이해합니다. 숫자가 특정 간격으로 있어야하는 이유는 무엇입니까?

데이터가 정규화되지 않으면 어떻게됩니까?


여기에 잘 설명되어 있습니다 .

MLP에서와 같이 입력 변수가 선형으로 결합 된 경우 최소한 이론적으로 입력을 표준화 할 필요는 거의 없습니다. 그 이유는 해당 가중치와 바이어스를 변경하여 입력 벡터의 크기 조정을 효과적으로 취소 할 수 있기 때문에 이전과 동일한 출력을 유지할 수 있습니다. 그러나 입력을 표준화하면 훈련 속도가 빨라지고 지역 최적 상태에 빠질 가능성이 줄어드는 여러 가지 실용적인 이유가 있습니다. 또한 표준화 된 입력을 통해 가중치 감소 및 베이지안 추정을보다 편리하게 수행 할 수 있습니다.


신경망에서는 데이터를 정규화 할뿐만 아니라 데이터를 확장하는 것이 좋습니다. 이것은 오류 표면에서 전역 최소값에 더 빠르게 접근하기위한 것입니다. 다음 사진을보십시오 :정규화 전후의 오차 표면

스케일링 전후의 에러 표면

Pictures are taken from the coursera course about neural networks. Author of the course is Geoffrey Hinton.


Some inputs to NN might not have a 'naturally defined' range of values. For example, the average value might be slowly, but continuously increasing over time (for example a number of records in the database).

In such case feeding this raw value into your network will not work very well. You will teach your network on values from lower part of range, while the actual inputs will be from the higher part of this range (and quite possibly above range, that the network has learned to work with).

You should normalize this value. You could for example tell the network by how much the value has changed since the previous input. This increment usually can be defined with high probability in a specific range, which makes it a good input for network.


Looking at the neural network from the outside, it is just a function that takes some arguments and produces a result. As with all functions, it has a domain (i.e. a set of legal arguments). You have to normalize the values that you want to pass to the neural net in order to make sure it is in the domain. As with all functions, if the arguments are not in the domain, the result is not guaranteed to be appropriate.

The exact behavior of the neural net on arguments outside of the domain depends on the implementation of the neural net. But overall, the result is useless if the arguments are not within the domain.


The reason normalization is needed is because if you look at how an adaptive step proceeds in one place in the domain of the function, and you just simply transport the problem to the equivalent of the same step translated by some large value in some direction in the domain, then you get different results. It boils down to the question of adapting a linear piece to a data point. How much should the piece move without turning and how much should it turn in response to that one training point? It makes no sense to have a changed adaptation procedure in different parts of the domain! So normalization is required to reduce the difference in the training result. I haven't got this written up, but you can just look at the math for a simple linear function and how it is trained by one training point in two different places. This problem may have been corrected in some places, but I am not familiar with them. In ALNs, the problem has been corrected and I can send you a paper if you write to wwarmstrong AT shaw.ca


I believe the answer is dependent on the scenario.

Consider NN (neural network) as an operator F, so that F(input) = output. In the case where this relation is linear so that F(A * input) = A * output, then you might choose to either leave the input/output unnormalised in their raw forms, or normalise both to eliminate A. Obviously this linearity assumption is violated in classification tasks, or nearly any task that outputs a probability, where F(A * input) = 1 * output

In practice, normalisation allows non-fittable networks to be fittable, which is crucial to experimenters/programmers. Nevertheless, the precise impact of normalisation will depend not only on the network architecture/algorithm, but also on the statistical prior for the input and output.

What's more, NN is often implemented to solve very difficult problems in a black-box fashion, which means the underlying problem may have a very poor statistical formulation, making it hard to evaluate the impact of normalisation, causing the technical advantage (becoming fittable) to dominate over its impact on the statistics.

In statistical sense, normalisation removes variation that is believed to be non-causal in predicting the output, so as to prevent NN from learning this variation as a predictor (NN does not see this variation, hence cannot use it).


숨겨진 계층은 데이터의 복잡성에 따라 사용됩니다. 선형으로 분리 가능한 입력 데이터가있는 경우 숨겨진 게이트 (예 : OR 게이트)를 사용할 필요는 없지만 선형으로 분리 할 수없는 데이터가있는 경우 숨겨진 레이어 (예 : ExOR 논리 게이트)를 사용해야합니다. 모든 계층에서 취한 노드 수는 출력의 교차 검증 정도에 따라 다릅니다.

참고 URL : https://stackoverflow.com/questions/4674623/why-do-we-have-to-normalize-the-input-for-an-artificial-neural-network

반응형